Copying a large amount of data between disks
November 12th, 2007We upgraded the disks on our backup server to create some much-needed space. When the new drive arrived, I had to reformat it and copy the entire contents of the old backup disk to the new one. I needed a way to transfer 250GB of small files, while preserving permissions and hard links. I tried three different methods of copying the data before finally succeeding.
This is on a Xen virtual server and memory is somewhat limited (256MB). Both rsync and restore took *way* too much memory. Rsync would have required many gigabytes of memory, and restore required at least a couple. Restore also required a large amount of temporary disk space (a couple gigabytes) for storing permissions and mode information. In the end, the good old tar command was the highest-performance method, consuming about 150MB of memory at its peak for the restore. The bash command line was:
(cd /data && tar -c -p -f - .) | (cd /data2 && tar -x -p -f -)
Tar is dependent on the current directory for its context, so I used a subshell around each command to ensure that they were in the right directory. I could have used semicolons to separate the “cd” command from the “tar” command in each subshell, but then the “tar” would have proceeded whether or not the “cd” was successful (for example if I had made a typo in the directory name). This would be bad since a failed “cd” before a tar restore could result in all of those files being written and potentially intermingled with the wrong filesystem. Using “&&” to separate the commands ensures that the tar command only proceeds if the “cd” was successful.
