How to clean up of a 600GB backup

Here are some thoughts about cleaning up data after a backup, recovery or restoration.

Rsync stats from one USB harddrive to another, it took over 17 hours during which I was away from home 🙂

My bigger problem was that I needed to manualy scan the images, due some partition recovery incident. The directory contained 103541 JPGs that was 12 GB of data. For moving just the wider photos (width over 800px) to a separate dir I used the following command:

for f in *.jpg;do if [ `identify "$f" | cut -f3 -d ' ' | cut -f1 -d x` -gt 800 ] ; then mv "$f" big/;fi;done

Listing with specified first character was very handy, which also worked for moving or removing:

localhost:/tmp/backup$ ls [a]*.jpg
localhost:/tmp/backup$ ls [b,B]*.jpg

Some digital cameras start naming the photo files with IMG, DSC, P, …, so I moved them to reduce some searching:

localhost:/tmp/backup: mv IMG* ../jpg
localhost:/tmp/backup: mv DSC* ../jpg
localhost:/tmp/backup: mv P* ../jpg

Next I moved the files containing year numbers

localhost:/tmp/backup: mv *2013* ../jpg/2013
localhost:/tmp/backup: mv *2012* ../jpg/2012

Moving files according to their file types is also handy:

localhost:/tmp/backup$ mv `find . -name "*sql"` ../sql/
localhost:/tmp/backup$ mv `find . -name "*zip"` ../zip/

If you getting error /bin/rm: Argument list too long., then try:

find . -name '*.php' -print0 | xargs -0 rm

Find empty directories and remove them:

find . -type d -empty -exec rm -r {} \;

Backup with RSYNC and SSH authorized key

Finally I wrote my rsync backup script v. 0.1


# ~/bin/
# Some help from
# Thanks

# man rsync
# -v be verbose
# -h human readable bytes
# -a, --archive archive mode; same as -rlptgoD (no -H) -H hard-links
# -z compress data during transfer
# --progress show file transfer progress
# -e remote shell to use

time rsync -vhaz --progress -e "ssh -i .ssh/id_rsa" \
--exclude ".DS_Store" \
--exclude "._.DS_Store" \
--exclude "Thumbs.db" \
--exclude "thumbs.db" \
--exclude "desktop.ini" \
--exclude ".svn" \
--exclude ".git" \
/Volumes/data/Dropbox/ \
# >> backup.log # log output
# &> /dev/null

# needs FTP password
# time rsync -vhaz --progress -e ssh \
# --exclude ".DS_Store" \
# --exclude "._.DS_Store" \
# --exclude "Thumbs.db" \
# --exclude "thumbs.db" \
# --exclude "desktop.ini" \
# --exclude ".svn" \
# --exclude ".git" \
# /Volumes/data/Dropbox/ \
# # >> backup.log # log output
# # &> /dev/null