Backing up Using Rsync
This a short guide on how to use
rsync to back up important files from your
self managed machine to your DICE file space.
Introduction
The first resources you will need are the
rsync man page and the
rsync documentation. You also need an installation of rsync. Almost all linux distributions come with an rsync package, often installed as part of the basic installation. You should refer to your distribution documentation for more information. Mac OS X also has the rsync command line tool installed as part of the standard instalaltion and is accessible using the terminal application. Contributions for how best to install rsync on Windows are welcome.
It is worth mentioning at this point the most important command line option available when using rsync:
-n
(aka
--dry-run
). When used in conjunction with the
-v
(aka
--verbose
) option this allows you to see what will happen before you try it for real. You should
always test your rsync commands with the options
-v -n
(this can be shortended to
-vn
).
Be careful, you can end up deleting files rather than copying them using rsync if you get your source and destinations wrong.
Using Rsync
At its most simple level backing up your home directory on your self managed machine can be done with the command:
bash$ rsync -e ssh -av $HOME/ <remote user>@<remote server>:/home/<remote username>/Backup
The
-e ssh
option instructs rsync to use ssh as the transport mechanism and allows you to transfer files to any machine you can SSH into. The
-av
uses the
archive mode to transfer files and requests
verbose output.
The
$HOME/
part refers is the source location, also refered to as the
root of transfer. All other actions are preformed relative to this directory.
<remote user>@<remote server>
refers to the remote username and the remote server you wish to transfer to, for example
joe@myserver.inf.ed.ac.uk
. The
/home/<remote username>/Backup
part refers to the destination directory on the remote server to transfer files to, for example
/home/joe/Backup
.
Note that the
/home/<remote username>/Backup
directory must already exist on the remote side.
For most users this basic recipe will need a little tweaking to take into account that backing up
all of your home directory is likely to be overkill and cause storage space problems on the other destination side.
A more practial example is:
bash$ rsync -e ssh -vrlpt \
$HOME/ \
<remote user>@<remote server>:/home/<remote user>/Backup \
--include "/Desktop/" \
--include "/Desktop/**" \
--include "/Documents/" \
--include "/Documents/**" \
--include "/Library/" \
--include "/Library/Mail/" \
--include "/Library/Mail/**" \
--exclude "*"
In this example the
-av
options have been replaced with
-vrlpt
. This is because the
-a
option implies
-rlptgoD
which will attempt to preserve the group and user ownership attributes This not always suitable if the user and groups at the two ends have different meanings. This is quite likely to be the case on a self managed machine. Not using these options will ensure that the files on the remote side are owned by your remote user and the default group for your user on the remote side.
The
-D
option allows rsync to handle device files, however you are unlikely to have any of these in your home directory making this option unecessary.
The important parts are all the
--include
lines and the
--exclude "*"
at the end. Each of the
--include
options come in sets:
--include "/Documents/"
--include "/Documents/**"
The first line directs rsync to include the directory
/Documents/
in the set of files to be copied. The second line directs rsync to include all files and subdirectories (recursively due to the = -r= option to rsync) below the
/Documents/
directory. All the include patterns are relative to the
root of transfer which is specified by the source option to rsync, in this case
$HOME/
.
When there are multiple levels of directory structure to the target directory the rules are similar but with extra
--include
per parent directory:
--include "/Library/"
--include "/Library/Mail/"
--include "/Library/Mail/**"
There must be an
--include
rule for each directory level that makes up the desired source directory. The above rule will copy the
/Library/Mail/
directory structure to the destination and all the files inside the
/Library/Mail/
directory. It will
not copy any other files in the
/Library/
tree unless you add a =--include "Library/**" rule. Using these patterns you can build up a very precise picture of what you want to transfer.
The
--exclude "*"
option is present as the default rule for rsync transfers is to include everything relative to the root of transter (at least if the
-r
option is included). When only a subset of the files is desired to be transfered the basic strategy is to specify everything you do want using
--include
rules and exclude everything else.
The best way to pass complex
--include
and
--exclude
rulesets to rsync is to use the
--include-from=FILE
option. This allows the sets of rules to be listed in a file rather than on the command line. The notation is similar to using the
--include
and
--exclude
although a little shorter. An example
--include-from=FILE
equivalent to the rules above would be:
+ /Test/
+ /Test/Desktop/
+ /Test/Desktop/**
+ /Test/Documents/
+ /Test/Documents/**
+ /Test/Library/
+ /Test/Library/Mail/
+ /Test/Library/Mail/**
- *
This would allow you to use the command:
bash$ rsync -e ssh -vrlpt \
$HOME/ \
<remote user>@<remote server>:/home/<remote user>/Backup \
--include-from=<name of file containing the rules above>
For more information on =--include" and "--exclude" patterns see the
EXCLUDE PATTERNS section of the man page.
Other Resources
This page has some
Tips and Tricks. For examples of more advanced use of rsync for backup see this
very good article.
--
CarwynEdwards - 21 Sep 2005