Howto set up a mirror of marxists.org

By Chris Chroom, 2004; revised by Jonas Holmgren, 2008.

NOTE: Due to an increasing number of DDoS attacks, we've been forced to restrict this service to official mirrors only. Please contact us if you have questions about what the requirements are to become an official mirror. See our FAQ on how to make a local copy of a section of the site for personal use —JH.

Please don't hesitate to email us if you are trying to set up a mirror and having problems.

Things you need

These instructions assume that you have shell access to a UNIX/Linux web server that has rsync and Apache installed.

Root access makes it easier to tweak the Apache setup but it's not necessary.

Mirror the MIA using rsync

Decide where on your file system you are going to store the MIA archive. For instance:

    /var/www/marxists-org/html/
    

Prepare the file that will contain the password used for authenticating to the rsync module:

    cat <<- EOF > /etc/rsync/marxists.secret
    password
    EOF
    chmod 600 /etc/rsync/marxists.secret
    

Start the download (synchronization) with

    rsync -rptzv --delete --no-motd --password-file=/etc/rsync/marxists.secret username@rsync.marxists.org::www /var/www/marxists-org/html/
    

For rsync versions prior to 3.0.7, use this instead:

     export USER=username
     export RSYNC_PASSWORD=password
     rsync -rptzv --delete --no-motd rsync.marxists.org::www /var/www/marxists-org/html/
     unset USER
     unset RSYNC_PASSWORD
    

This will take a while as there will be hundreds of gigabytes downloaded the first time.

Setup Apache

If you don't have write permissions in the /etc/httpd/conf.d or /etc/apache2/sites-available directory then you can skip this set of instructions.

Add the following to either /etc/httpd/conf.d/mia.conf (RHEL, CentOS, etc.) or /etc/apache2/sites-available/mia.conf (Debian, Ubuntu, etc.):

# example of a named-based virtual host configuration
<VirtualHost *:80>
	ServerName	www.marxists.org.uk
	ServerAlias	marxists.org.uk
	ServerAdmin	webmaster@marxists.org.uk

	DocumentRoot	/var/www/marxists-org/html

	<Directory	/var/www/marxists-org/html>
		DirectoryIndex index.htm index.html
		Options -Indexes +FollowSymLinks
		AllowOverride None
		<IfVersion >= 2.3>
			Require all granted
		</IfVersion>
		<IfVersion < 2.3>
			Order allow,deny
			Allow from all
		</IfVersion>
	</Directory>
	<IfVersion >= 2.3>
		LogLevel info rewrite:warn
	</IfVersion>
	<IfVersion < 2.3>
		RewriteLogLevel 1
	</IfVersion>

	ErrorLog ${APACHE_LOG_DIR}/www.marxists.org.uk-error.log
	CustomLog ${APACHE_LOG_DIR}/www.marxists.org.uk-access.log combined
</VirtualHost>

Using cron to automate updates

There is no need to run rsync as root so the crontab can be edited as a regular user. The only requirement is that this user has write permissions in the webroot, e.g. /var/www/marxists-org/html/.

Use the command crontab -e to open the crontab in editor mode, this will generally open in vi, and then add a line like this:

# run at quarter past one every morning
15 1 * * *      /usr/bin/rsync -rptzv --delete --no-motd --stats --password-file=/etc/rsync/marxists.secret \
                username@rsync.marxists.org::www /var/www/marxists-org/html/ 2>&1 >>/var/log/rsync.marxists.org.log
    

Internal Notes

A replacement for rsh can be specified, such as ssh. Here's an example that uses SSH to get a subdirectory branch.

rsync -rptzv -e ssh account@marxists.org:/www/mia/admin/janitor ~/Documents
    

 


Contact the Marxists Internet Archive Admin Committee for further information