Saturday, April 23, 2011

Backing up with rdiff-backup

I recently tried out rdiff-backup for my backups and really like it a lot. It is a command-line utility written in Python that can operate locally or remotely via SSH. When it is first run, it copies over all of your files to the backup directory. On subsequent backups, it only copies whatever has changed since the most recent backup, updates the mirror, and stores the changes it made to the mirror. The end result is that you always have a fully up-to-date mirror of your files, but at any point you can restore from previous backups. The backup directory consumes minimal disk space, and the backup process is very fast since it is only copying the changes to your computer.

 The syntax is similar to the "cp" command: the command itself, followed by the source directory, then the destination (backup) directory, like so:

     rdiff-backup /home/jizldrangs /usr/backups

When using SSH, use the server name, followed by double colons and the absolute path, like this:

     rdiff-backup /home/jizldrangs fileServerName::/home/jizldrangs/backups

As with many command-line tools, there are a lot of options, most importantly the option to include or exclude certain paths or files. See the examples and man page for details on how to fine-tune your backup.

I put this on my netbook and desktop, which are both running Ubuntu Maverick, but my wife's machine had only the sporadic backups I had made to our USB hard drive, and I wanted a more consistent plan. Fortunately, rdiff-backup, being a Python application, also has a Windows version. If you backing up to a local directory or mounted network drive, you are good to go.

If you want to back up over SSH, it gets a little sticky but it can be done. Using the instructions on this post I downloaded plink.exe (the command-line version of Putty), and created a batch file with the following:

"C:\Program Files\rdiff-backup.exe" -v5 --no-hard-links --exclude-symbolic-links --remote-schema "plink.exe -i rsa.ppk %%s rdiff-backup --server" "C:\\Documents and Settings\Mrs. Jizldrangs\My Documents" mrsjizldrangs@myfileserver::/home/mrsjizldrangs/backups-my-docs

This batch resides in the same directory as plink.exe, which is why the full path isn't specified. Here is a breakdown of the arguments:
  • no-hard-links and exclude-symbolic-links: these are necessary for windows machines per the blog post above
  • remote-schema: The method of contacting a remote server (in our case, an SSH server over plink.exe)
  • The last two arguments are the source directory and destination (i.e. backup) directory
Plink has some arguments as well, here is the breakdown:
  • i: the name of the ssh key to use for authentication. I created an SSH keypair using PuttyGen, which generates 2 files, a public key and a private key. I added the contents of the public key to the authorized_keys file on the server, and the argument specified above is the private key, which is also located in the same directory as plink.exe and the batch file
  • The %%s tells rdiff-backup to run what follows on the remote server
  • rdiff-backup --server: This is run on the remote machine and all it does is start rdiff-backup in server mode
All right! All of the machines are backed up and everything is peaches and cream. In fact, my newfound confidence gives me a case of Linux Distro Wanderlust, and I got it in my head that I wanted to switch my desktop from Ubuntu Maverick to Linux Mint 10. There's no reason not to, especially since I can just do an automated restore and all my files will come rushing back. So I wiped the disk, installed Mint 10, and began the restore. After a long time I received an error that permission was denied on ~/.gvfs. Fortunately, rdiff-backup allows you to include or exclude as many folders as you want with the --exclude argument. I excluded it, and tried again. I got that same error on ~/.local and ~/.subversion, so I ended up excluding those directories as well, with the final command looking like this:

     rdiff-backup -v5 --force -r now --exclude **/.subversion/** --exclude **/.gvfs/** --exclude **/.local/** myFileServer::/home/jizldrangs/vengeance-backup ~

Here's a breakdown of the arguments:
  • v5: Verbosity level 5. The available levels are 1 being the lowest through 9 which outputs so much info that it is impossible to read. 5 is a nice happy medium as it lists the files it is working on.
  • force: this is necessary to add when doing a restore to a directory that already has some version of the files you are trying to restore. In my case, the default Home directory created for me by Linux Mint already had some default folders, so I had to force rdiff-backup to overwrite them with the version from my backup.
  • r: specifies a restore
  • now: tells rdiff-backup when to restore as of (see the man page for alternative options if you want to go to a past backup)
  • exclude: tells rdiff-backup that these folders exist in the backup but not to restore them
  • the last two arguments specify where to restore from (i.e. the backup directory) and where to restore to (in my case the Home directory, you can change this to restore somewhere else and have access to multiple versions of your files)
After several days of trial and error in figuring out which directories are going to cause problems when copying over, I was able to do a restore of my files, which is a relief because you never know whether your backups are any good until you've done a restore. Happy backing up!

2 comments:

  1. Thanks, that helped a lot :)

    You only need one % (%s) in the remote schema on the command line, but two (%%s) in a batch file due to variable expansion.

    ReplyDelete
  2. Here is the way to specify the path to a given ssh key for rdiff-backup (using ssh.exe, not plink.exe)

    You can take the ssh.exe (and its numerous dll) from the cygwin project, place it in the %PATH% or in the same directory as rdiff-backup.


    rdiff-backup.exe -v5 --remote-schema "ssh.exe -o StrictHostKeyChecking=no -i path\to\local\sshkeyfile %s rdiff-backup --server" "C:\local\directory" user@remoteserver::/remotedirectory

    The "-o StrictHostKeyChecking=no" argument makes ssh less verbose and prevents it to wait for "yes" answer, as it can't add the remoteserver to its known hosts.

    By the way, thank you for this interesting article Jizldrangs, it helped a lot in my project.

    ReplyDelete