Programming Answer: Backup solution to backup terabytes and lots of static files on linux server ?

Hi,

Which backup tool or solution would you use to backup terabytes and lots of files on a production linux server ?
Note that the files are all different and almost never modified, and usage is mostly adding files, so data volume is today 3TB growing all the time at around +15GB/day.

Please do not reply rsync. Basic unix tools are not enough, rsync does not keep history, rdiff-backup miserably fails from time to time and screw the history. Moreover these are all file based backup, which put a lot of IOwait just to browse directories and query stat(). But i guess, except R1Soft CDP, there is no way around that.

We tried R1Soft CDP backup, which is block level backup, and it proved good and efficient for all our other servers, but systematically fails on the server with 3 terabytes and gazillions of files. That is already more than 2 months that the engineers of R1Soft and datacenter are playing a hot ball game... and still no backup except regular rsync

We never tried big commercial solutions, except R1Soft CDP since it was provided as an optional service by the datacented hosting our servers.

From serverfault

Try BackupPC. For me it works very well with couple of terabytes of data and tens of millions of files (some 100 000 - 500 000 of those changing daily). OK, BackupPC does use rsync and is file based, so that might be a show-stopper for you.

Bacula is another popular one, and it sure has the coolest slogan of them all. And it even does not use rsync! :-)

: We use BackupPC on our local intranet, mainly because these are desktop PC and the deduplication feature of BackupPC is really helpful

From Janne Pikkarainen
EMC Networker has an option called SnapImage that should increase backup speed for your kind of data.

I have only heard about it, but I never tried, sorry...

From marcoc
I tried many backup solution, started with rsync and rdiff-backup. Also pure tar-ing and bash scripts. But bacula beats them all. It is based on modular design, I have about 8 PCs in backup network and growing.

To anyone I recommended bacula, they were more than happy to finally their home.

From iElectric
I think only solution for you is block-level backups
You may write scripts that uses LVM snapshots (or even lower level dm-snapshots) and transfer them to storage server

You also may take a look into Zumastor project and their ddsnap utility

PS. Solaris/FreeBSD servers have ZFS that can automate this process by using incremental snapshots + ZFS send/recive

From SaveTheRbtz
rsnapshot

or, if you want more control; just hack up a short bash script to do the same thing: one cp -al, a few mv and rsync.

i use it on a very busy 30TB server with around 5million files, and works wonderfully.

From Javier

Try using mirrordir. With an appropriate script, it seems to be the ideal solution for you. It only updates the files which have changed, (modified, created, or deleted,) but also has the capability to preserve old files. I'm not sure how that function works, but it shouldn't be hard. Here's the script I use: (Edited somewhat for clarity. Hope I didn't cause problems with the edits)

#! /bin/bash

logfile="/home/share/Backup-log.txt"

echo "" | unix2dos >> $logfile
echo `date`"   /bin/mirror_backup started" | unix2dos >> $logfile

echo ""
echo ""
echo "mirror_backup   Automatically archive a list of"
echo "                directories to a storage location"

# Mount mirror drive
mount -o remount,rw /mirror
xstatus=$?
if [ $xstatus -ne 0 ]
then
        mount -o remount,rw /mirror 2>&1 | unix2dos >> $logfile
        echo `date`"   Mount failed, aborting /bin/mirror_backup..." 1>&2
        echo `date`"   Mount failed, aborting /bin/mirror_backup..." | unix2dos >> $logfile
        mount -o remount,ro /mirror 2>> /dev/null
        exit $xstatus
fi

# Define Source Directories
sourcelist="/home /etc /root"
dest="/mirror"

for dir in $sourcelist
do
        if [ ! -d ${dest}${dir} ]
        then
                mkdir -p ${dest}${dir} 2>&1 | unix2dos >> $logfile
#               chown mirror:mirror ${dest}${dir}
        fi
done


# Mirror directories

for dir in $sourcelist
do
        # Delete old files
        echo ""
        echo "Deleting old files in "${dest}${dir}
        mirrordir --nice 0  --exclude-from /root/exclude-list --only-delete ${dir} ${dest}${dir} 2>> /dev/null

        # Run full mirror
        echo "Mirroring "${dir}" to "${dest}${dir}
        mirrordir --nice 0 --restore-access --access-times --exclude-from /root/exclude-list ${dir} ${dest}${dir} 2>&1 | unix2dos >> $logfile

done

# Perform miscellaneous tasks

report="/home/share/disk-report.txt"
echo "Report generated on "`date` | unix2dos > $report
echo "" | unix2dos >> $report
echo "RAID drive status:" | unix2dos >> $report
cat /proc/mdstat | unix2dos >> $report
echo "" | unix2dos >> $report
echo "Disk usage per slice:" | unix2dos >> $report
df -h | unix2dos >> $report
echo "" | unix2dos >> $report
echo "Disk Usage per User:" | unix2dos >> $report
du -h --max-depth 1 /home | unix2dos >> $report
echo "" | unix2dos >> $report
echo "Disk Usage on Share drive:" | unix2dos >> $report
du -h --max-depth 1 /home/share | unix2dos >> $report
echo "" | unix2dos >> $report
echo "Filesystem Usage Overview:" | unix2dos >> $report
du -h --max-depth 1 / | unix2dos >> $report
echo "" | unix2dos >> $report
echo "Report Complete" | unix2dos >> $report

echo ""
echo "mirror_backup complete."

# Unmount Mirror Drive
mount -o remount,ro /mirror 2>&1 | unix2dos 2>> $logfile
echo `date`"   /bin/mirror_backup completed successfully" | unix2dos >> $logfile

exit 0

With no changes to commit (second run-through, for example) it takes about 5-7 minutes to scan 1.5 TB of files. Of course, it's a lot slower on the first run-through.

By the way, this script was written by me for my use on my personal server at home. While anyone is absolutely free to use or modify it for themselves, I am making absolutely no guarantees or warranties. It's free, so you get what you pay for. Hope it helps, though!

: I currently use rdiff-backup, which looks like an equivalent of mirrordir. I should try it, as rdiff-backup is not very robust and frequently corrupts its own history.

Jesse : I've been using this setup for about 4 years now, and I've never had problems with it. I just use cron to run it every night, and it does quite well. I've had to recover with these backups several times (drive failures, dumb mistakes on my part, you name it...) and other than moving the respective RAIDs in fstab and moving a few directories, it is a hassle-free recovery. Note that I just press my backup array into service as the primary. When I get the replacement hard drive, I rebuild its array and make it the backup. Not the slickest, but effective anyway.

From Jesse

You don't say what you want to back it up to; tape or disc? Assuming the former, then I endorse the recommendations for bacula. I use it at several different sites, at one of which I have it driving a 60-slot two-drive LTO2 robot, with a total of maybe 50TB of tape storage spread over 120 tapes, and the single largest server having about 4TB of disc. Bacula is very, very good when it's properly configured.

Disc backups I can't comment on usefully, as I'm firmly an old-style tape man myself. Since you specifically mention keeping history, I'd hope you were open to removable-media (ie, tape) backups.

: Ah, unfortunately, i was not precise enough. We do disk backup. We just need history for a very short time. Just the time needed to figure out something had been wrong or disappeared. We would always restore the last version, there is no need for the application to restore a specific old backup.

From MadHatter

Programming Answer

Friday, January 28, 2011

Backup solution to backup terabytes and lots of static files on linux server ?

0 comments:

Post a Comment

Blog Archive