Automatic backup with cron and rsync

Cron is a Unix scheduler, useful to automate tasks. Rsync is a tool to synchronize files and directories between from one location to another, even if in remote machines. It is smart enough to copy only files that were updated, making it suitable to perform automatic backup.

Combining these tools, it’s very simple to perform backups, periodically copying contents from one machine to another.

First of all, one can setup cron through crontab. There are two useful commands for this tool:

crontab -l

which lists all scheduled tasks.

crontab -e

which opens up a text editor to edit the scheduler. So, let’s open the editor and add the following line

00 00 * * * /path/to/script/script-name.sh

The instruction above tells cron to execute script-name.sh every 0h00 (that is, every midnight), so it’s a day-base schedule. You can understand how to setup different periods on wikipedia.

Now, let’s see what goes on the script file.

rsync --stats --compress --rsh=/usr/bin/ssh --recursive <ssh user>@<ssh address>:< path to directory to be copied on remote location> > log.txt

--stats – show some stats about the transfer, such as how many files were updated

--compress – compress file during transfer. This is particularly useful if the files are to be transfered through a network. In this case we expect the reduction in transfer time to pay off the extra time spent in compressing/decompressing.

--rsh=/usr/bin/ssh – tells rsync to use ssh (/usr/bin/ssh is the path to the ssh binary. Maybe in your system is different, so you have to change it)

--recursive – tells rsync to copy all files and directories under the specified file. If you don’t use this option, it will just copy the (empty) directory that you passed as address.

<ssh user>@<ssh address>:< path to directory to be copied on remote location> – this is the actual address of you remote file. Note that the machine where you’re currently executing the script, must have password-free access to the remote machine. To configure this, refer to this article.

We then save the output (basically the statistics) to a log. This file is also useful to check if the script/scheduler are working. Check the modification date of this file to see if it matches the schedule.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s