Rsync

From SobellWiki

Jump to: navigation, search

Contents

rsync

rsync can be used instead of cp (on a local machine) and to copy files to and from a remote computer. One computer must be a local computer; rsync cannot copy between two remote computers.

Options

--archive (-a)

Archive will copy everything from this inode down, preserving permissions, etc. and making the -r flag unnecessary. --archive is the same as -rlptgoD. -r is covered above, -l tells rsync to copy symlinks as symlinks (even if their target is missing in the destination directory structure, -p preserves permissions on files that are transferred, -p preserves modification times, -g preserve group, -o preserves owner (super-user only), -D same as --devices --specials and preserves devicefiles and special files.

--recursive (-r)

Recursive tells rsync to cover every branch of a directory, including the directory itself. Use this option to send or receive an entire branch of a directory structure.

--verbose (-v)

Verbose tells rsync to output more information about what its doing. This is useful for debugging and for keeping track of which files have been transferred at any given point.

--relative (-R)

The relative option is necessary if you don't have the same directory on the source and destination computers. It will only send the file at the end of the command line arugment. Ex: rsync -av /a/b/c.txt coffee:~max/backup/ will create ~max/backup/c.txt, while rsync -avR /a/b/c.txt coffee:~max/backup/ creates ~max/backup/a/b/c.txt on the remote computer.

--delete

The delete option will cause rsync to delete any files in the destination that are not in the source. This can be very dangerous -- a missing or extra / can delete an entire directory tree. Use it with --dry-run at first

--dry-run

Dry-run will run the rsync algorithm without writing anything to disk. With the -v option, the output will tell what would have happend. Very useful with the --delete option (see Notes).

--update (-u)

The update option skips files that are newer in the destination directory [newer than what? (files with the same name in the source directory)]. For an example, see "keeping two computers in sync".

--backup and --backup-dir=DIR

The --backup option, along with --backup-dir=DIR, [Do we need both? >>YES<<] will put files that otherwise would have been overwritten into DIR on the receiver. Instead of overwriting an older file on the receiver, rsync will first move the older file to DIR and then write the newer file to the destination directory.

--link-dest=DIR

The link-dest option is demonstrated below. link-dest allows rsync to hard-link to identical files that already exist on the remote machine in DIR instead of uploadingb new copies of the files.

--copy-unsafe-links

Using the copy-unsafe-links option, any symlinks which point outiside of the source directory structure will be copied as files, not symlinks.

Examples

Basic rsync example:

The working directory holds two files, one and two.

max@maxtop:~/rsync$ ls -l test
total 0
-rw-r--r-- 1 max max 0 2009-07-12 12:03 one
-rw-r--r-- 1 max max 0 2009-07-12 12:03 two

Using rsync with basic options, --recursive and --verbose, we run into problems when using rsync's copy functionality.

max@maxtop:~/rsync$ rsync -rv test new_test
sending incremental file list
created directory new_test
test/
test/one
test/two

sent 147 bytes  received 54 bytes  402.00 bytes/sec
total size is 0  speedup is 0.00

First, when we look at the directory new_test, it contains a directory test within it. Furthermore, the rsync has changed the mod time of the files.

max@maxtop:~/rsync/new_test/test$ ls -l
total 0
-rw-r--r-- 1 max max 0 2009-07-12 12:04 one
-rw-r--r-- 1 max max 0 2009-07-12 12:04 two

Note that in this example, there is a trailing slash after test. The slash is equivalent to /* -- it tells rsync to ignore the directory itself and copy the files within the directory. The slash after test/ is all that matters - either way (new_test or new_test/) will yield the same result. For clarity, throughout the examples we will follow every directory with / and specify the destination directory names explicitly. Also, we use the --archive option to preserve the mod times of the files we are rsyncing.

max@maxtop:~/rsync$ rsync -av test/ new_test/
sending incremental file list
created directory new_test
./
one
two

sent 146 bytes  received 53 bytes  398.00 bytes/sec

Now when we look at the contents of new_test:

max@maxtop:~/rsync$ ls -l new_test
total 0
-rw-r--r-- 1 max max 0 2009-07-12 12:03 one
-rw-r--r-- 1 max max 0 2009-07-12 12:03 two

We see the files have the same timestamp as the files in test, and they are immediately under new_test and not another level down. In the future, we will always use -av for rsync.

--delete

The --delete option will delete everything from destination that is not in source.

max@maxtop:~/rsync$ ls test
one  three  two
max@maxtop:~/rsync$ ls new_test
four  one  two

First try with the dry-run option. This option will show any files that will be deleted prefaced with "deleting". The file five will be deleted from new_test because it does not appear in test. The files three and four are new files in test and new_test respectively.

max@maxtop:~/rsync$ rsync -av --delete --dry-run test/ new_test/
sending incremental file list
./
deleting four
three

sent 81 bytes  received 18 bytes  198.00 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

Notice that nothing has been changed in either test or new_test:

max@maxtop:~/rsync$ ls test
one  three  two
max@maxtop:~/rsync$ ls new_test
four  one  two

Now we run the command without the --dry-run option:

max@maxtop:~/Desktop/Dropbox/rsync$ rsync -av --delete test/ new_test/sending incremental file list
./
deleting four
three

sent 117 bytes  received 34 bytes  302.00 bytes/sec
total size is 0  speedup is 0.00

After running the rsync job without the --dry-run option, both directories show the original contents of test:

max@maxtop:~/Desktop/Dropbox/rsync$ ls test
one  three  two
max@maxtop:~/Desktop/Dropbox/rsync$ ls new_test
one  three  two

Using rsync On a Remote Box

Using rsync on a remote computer over open ssh requires only the name of the remote computer in addition to the commands we've already given. For example, to copy a file to a remote computer, use:

rsync -av test/ coffee:~/test/

This will create a new directory, test, on the remote computer coffee and copy the contents of my local test into test on coffee.

To copy from a remote computer to the local computer, use:

rsync -av coffee:~/test/ new_test/

This command creates a new directory test on my local computer in my working directory and copies the contents of test from the directory /home/max on the remote computer coffee into test on my local computer.

Keeping 2 Computer in Sync

Using a 3rd "cloud" computer. Use this method when either one or both of the computers will be moving around and not always have access through the firewall. For example, if one or both of the computers are behind a firewall at your workplace, an internet cafe, or on the move in general, you cannot (or do not want to) set up port forwarding.

To update the remote machine from the local copy, we can use a shellscript, upsync, which holds the rsync command:

max@maxtop:~/rsync$ cat upsync
rsync \
--verbose \
--archive \
--compress \
--update \
~/rsync/test/ coffee:~/test/

To update from the remote machine we do the same, using a different shellscript called downsync:

max@maxtop:~/rsync$ cat downsync
rsync \
--verbose \
--archive \
--compress \
--update \
coffee:~/test/ ~/rsync/test/

Now that we have two working scripts, we can automate this backup using the crontab utility [reference]. The crontab utility automatically runs the commands it contains at specified intervals. In this case, we can to run both of these scripts every half-hour; on the hour and on the half-hour:

max@maxtop:~/rsync$ crontab -l
# m h  dom mon dow   command
0,30 *  *  *  *  ~/rsync/scripts/upsync
0,30 *  *  *  *  ~/rsync/scripts/downsync

Because both use --update, the order in which we run the scripts does not matter. Make sure the scripts are executable.

max@coffee:~$ ls test
four  one  two
max@maxtop:~/rsync$ ls test
one  three  two

After we pass the hour or half hour mark, the files will automatically synchronize.

max@maxtop:~/rsync$ ls test
four  one  three  two
max@coffee:~$ ls test
four  one  three  two

Using rsync for Backups

It is very efficient to create an initial backup and then backup all changes made to files in the original file set to an incremental backup file, and hard link the unchanged files from the original backup to the incremental backup. This way, the total size of all the backups is only the size of the changes made to all files, plus the original backup size, yet the user can access a snapshot of files from any given backup.

Using the --backup (-b) and --backup-dir=DIR options, we can perform an incremental backup using rsync. As files are copied to or from the remote computer, they will be be put into a different directory before they are overwritten. However, this backup will not be a full copy of each day's files. It will, at the end of the week, have made directories Monday through Sunday, with each directory containing original copies of the files that were changed that day. Instead of overwriting the old files, rsync copies them to that day's directory. In the shell, date +%A return the name of the current day of the week (Monday, Tuesday, etc.). By setting the environment variable BUNAME to the day of the week, and then setting the backup-dir to that variable, rsync will change the folder it puts the old files into each day.

#!/bin/sh
BUNAME=$(date +%A)
rsync \
--verbose \
--archive \
--update \
--backup \
--backup-dir=~max/$BUNAME/ \
~/rsync/test1/ remote-host:~max/test1/

However, using a different approach, we can take up the same amount of space but have a full set of files for each day. This is possible by using hard-links, and the functionality is built into rsync. A hard link is a pointer to a file. Each directory has at least 2 hard links to it: its pointer from the directory above it, and the . link within the directory itself. In addition, it has another hard link to itself from each of its child directories (the .. link within each of these directories). Files use hard links the same way, and a file will not be deleted from the disk until the number of hard-links to it reaches 0. To summarize, a hard-link simply points to a file and does not make a copy of the file. Using hard-links makes it possible to have a full set of files for each day, when on the disk there is really only one set of files that have not been changed, with pointers in each incremental backup to the same files. The changed files are the only files that take up extra space.

$ cat xt.max
ssh coffee 'rm -r bu.2; mv bu.1 bu.2; mv bu.0 bu.1'
rsync -av --delete --link-dest=../bu.1 src/ coffee:bu.0/

By using rsync's --link-dest=DIR option, we tell rsync to, instead of transferring a new file, check within DIR for an identical copy of the file to be transferred and make a hard-link to it. Originally, we will have to transfer the entire directory to back up, but during the next backup we will only have to transfer the files that have been changed. However, it will appear that there exist two completely separate sets of files -- those from the first backup, and those from the second. Note that this does the same thing as the last script we wrote but simply displays the files differently using hard-links.

Remove?

## max@coffee:~$ ls -li bu.0
## total 0
## 1253754 -rw-r--r-- 3 max max 0 2009-07-13 20:35 a
## 1253755 -rw-r--r-- 3 max max 0 2009-07-13 20:35 b
## 1253756 -rw-r--r-- 3 max max 0 2009-07-13 20:35 c
## 1253757 -rw-r--r-- 3 max max 0 2009-07-13 20:35 d
## max@coffee:~$ ls -li bu.1
## total 0
## 1253754 -rw-r--r-- 3 max max 0 2009-07-13 20:35 a
## 1253755 -rw-r--r-- 3 max max 0 2009-07-13 20:35 b
## 1253756 -rw-r--r-- 3 max max 0 2009-07-13 20:35 c
## 1253757 -rw-r--r-- 3 max max 0 2009-07-13 20:35 d

## Notice that while bu.0 and bu.1 each contain an entire set of files (a, b, c, and d), they exist on the same i-node. This is because we used hard-links.
max@coffee:~/bu.0$ ls -l
total 0
-rw-r--r-- 3 max max 0 2009-07-13 20:35 a
-rw-r--r-- 3 max max 0 2009-07-13 20:35 b
-rw-r--r-- 3 max max 0 2009-07-13 20:35 c
-rw-r--r-- 3 max max 0 2009-07-13 20:35 d

The link count is displayed using ls -l. Each file has 3 links to it -- one each from bu.0, bu.1, and bu.2.

In order for this to work, we must keep a rotating directory structure. Notice that we back up to ~/backup.0 on coffee. After we're finished backing up, this will be the most recent set of files. However, for the --link-dest=DIR option to work, we refer to ../backup.1. This is the next newest set of files. backup.2 is the 3rd newest, etc. Note that before the rsync command, we rotate the folders on the remote system. This produces an error the first 3 times because the older back directories are missing, but will continue with the rest of the steps properly. By the fourth time, bu.0, bu.1, and bu.2 all exist and the script should produce no errors.

Notes: The term dry-run comes from sailing when sailors would practice rigging their jib and mainsheets while on land (Wikipedia). This option is very useful when using the --delete option for the first time or when testing a script. A single trailing slash can delete an entire directory structure when using the --delete option:

$ rsync -av --delete --dry-run test/ .
sending incremental file list
./
deleting writing/writeup3.txt~
...
deleting writing/writeup2.txt
...
deleting scripts/demo
...
deleting new_test/two
...
deleting links_dir/linked_file
...
deleting writeup2.txt~
one
three
two
test_dir/

sent 119 bytes  received 28 bytes  294.00 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

Without the --dry-run option, this rsync command would have deleted my entire directory and replaced it with the files from the source directory because the destination directory was one level higher than it should have been. While it seems obvious not to do this when copying to a local machine, when the directories get deeper on a remote machine, it is much easier to do. It is strongly reccommended to run rsync with the --dry-run option the first time you run an rsync script with --delete.

Personal tools