Friday, January 28, 2011

Backup solution to backup terabytes and lots of static files on linux server ?

Hi,

Which backup tool or solution would you use to backup terabytes and lots of files on a production linux server ?
Note that the files are all different and almost never modified, and usage is mostly adding files, so data volume is today 3TB growing all the time at around +15GB/day.

Please do not reply rsync. Basic unix tools are not enough, rsync does not keep history, rdiff-backup miserably fails from time to time and screw the history. Moreover these are all file based backup, which put a lot of IOwait just to browse directories and query stat(). But i guess, except R1Soft CDP, there is no way around that.

We tried R1Soft CDP backup, which is block level backup, and it proved good and efficient for all our other servers, but systematically fails on the server with 3 terabytes and gazillions of files. That is already more than 2 months that the engineers of R1Soft and datacenter are playing a hot ball game... and still no backup except regular rsync

We never tried big commercial solutions, except R1Soft CDP since it was provided as an optional service by the datacented hosting our servers.

  • Try BackupPC. For me it works very well with couple of terabytes of data and tens of millions of files (some 100 000 - 500 000 of those changing daily). OK, BackupPC does use rsync and is file based, so that might be a show-stopper for you.

    Bacula is another popular one, and it sure has the coolest slogan of them all. And it even does not use rsync! :-)

    : We use BackupPC on our local intranet, mainly because these are desktop PC and the deduplication feature of BackupPC is really helpful
  • EMC Networker has an option called SnapImage that should increase backup speed for your kind of data.

    I have only heard about it, but I never tried, sorry...

    From marcoc
  • I tried many backup solution, started with rsync and rdiff-backup. Also pure tar-ing and bash scripts. But bacula beats them all. It is based on modular design, I have about 8 PCs in backup network and growing.

    To anyone I recommended bacula, they were more than happy to finally their home.

    From iElectric
  • I think only solution for you is block-level backups
    You may write scripts that uses LVM snapshots (or even lower level dm-snapshots) and transfer them to storage server

    You also may take a look into Zumastor project and their ddsnap utility

    PS. Solaris/FreeBSD servers have ZFS that can automate this process by using incremental snapshots + ZFS send/recive

  • rsnapshot

    or, if you want more control; just hack up a short bash script to do the same thing: one cp -al, a few mv and rsync.

    i use it on a very busy 30TB server with around 5million files, and works wonderfully.

    From Javier
  • Try using mirrordir. With an appropriate script, it seems to be the ideal solution for you. It only updates the files which have changed, (modified, created, or deleted,) but also has the capability to preserve old files. I'm not sure how that function works, but it shouldn't be hard. Here's the script I use: (Edited somewhat for clarity. Hope I didn't cause problems with the edits)

    #! /bin/bash
    
    logfile="/home/share/Backup-log.txt"
    
    echo "" | unix2dos >> $logfile
    echo `date`"   /bin/mirror_backup started" | unix2dos >> $logfile
    
    echo ""
    echo ""
    echo "mirror_backup   Automatically archive a list of"
    echo "                directories to a storage location"
    
    # Mount mirror drive
    mount -o remount,rw /mirror
    xstatus=$?
    if [ $xstatus -ne 0 ]
    then
            mount -o remount,rw /mirror 2>&1 | unix2dos >> $logfile
            echo `date`"   Mount failed, aborting /bin/mirror_backup..." 1>&2
            echo `date`"   Mount failed, aborting /bin/mirror_backup..." | unix2dos >> $logfile
            mount -o remount,ro /mirror 2>> /dev/null
            exit $xstatus
    fi
    
    # Define Source Directories
    sourcelist="/home /etc /root"
    dest="/mirror"
    
    for dir in $sourcelist
    do
            if [ ! -d ${dest}${dir} ]
            then
                    mkdir -p ${dest}${dir} 2>&1 | unix2dos >> $logfile
    #               chown mirror:mirror ${dest}${dir}
            fi
    done
    
    
    # Mirror directories
    
    for dir in $sourcelist
    do
            # Delete old files
            echo ""
            echo "Deleting old files in "${dest}${dir}
            mirrordir --nice 0  --exclude-from /root/exclude-list --only-delete ${dir} ${dest}${dir} 2>> /dev/null
    
            # Run full mirror
            echo "Mirroring "${dir}" to "${dest}${dir}
            mirrordir --nice 0 --restore-access --access-times --exclude-from /root/exclude-list ${dir} ${dest}${dir} 2>&1 | unix2dos >> $logfile
    
    done
    
    # Perform miscellaneous tasks
    
    report="/home/share/disk-report.txt"
    echo "Report generated on "`date` | unix2dos > $report
    echo "" | unix2dos >> $report
    echo "RAID drive status:" | unix2dos >> $report
    cat /proc/mdstat | unix2dos >> $report
    echo "" | unix2dos >> $report
    echo "Disk usage per slice:" | unix2dos >> $report
    df -h | unix2dos >> $report
    echo "" | unix2dos >> $report
    echo "Disk Usage per User:" | unix2dos >> $report
    du -h --max-depth 1 /home | unix2dos >> $report
    echo "" | unix2dos >> $report
    echo "Disk Usage on Share drive:" | unix2dos >> $report
    du -h --max-depth 1 /home/share | unix2dos >> $report
    echo "" | unix2dos >> $report
    echo "Filesystem Usage Overview:" | unix2dos >> $report
    du -h --max-depth 1 / | unix2dos >> $report
    echo "" | unix2dos >> $report
    echo "Report Complete" | unix2dos >> $report
    
    echo ""
    echo "mirror_backup complete."
    
    # Unmount Mirror Drive
    mount -o remount,ro /mirror 2>&1 | unix2dos 2>> $logfile
    echo `date`"   /bin/mirror_backup completed successfully" | unix2dos >> $logfile
    
    exit 0
    

    With no changes to commit (second run-through, for example) it takes about 5-7 minutes to scan 1.5 TB of files. Of course, it's a lot slower on the first run-through.

    By the way, this script was written by me for my use on my personal server at home. While anyone is absolutely free to use or modify it for themselves, I am making absolutely no guarantees or warranties. It's free, so you get what you pay for. Hope it helps, though!

    : I currently use rdiff-backup, which looks like an equivalent of mirrordir. I should try it, as rdiff-backup is not very robust and frequently corrupts its own history.
    Jesse : I've been using this setup for about 4 years now, and I've never had problems with it. I just use cron to run it every night, and it does quite well. I've had to recover with these backups several times (drive failures, dumb mistakes on my part, you name it...) and other than moving the respective RAIDs in fstab and moving a few directories, it is a hassle-free recovery. Note that I just press my backup array into service as the primary. When I get the replacement hard drive, I rebuild its array and make it the backup. Not the slickest, but effective anyway.
    From Jesse
  • You don't say what you want to back it up to; tape or disc? Assuming the former, then I endorse the recommendations for bacula. I use it at several different sites, at one of which I have it driving a 60-slot two-drive LTO2 robot, with a total of maybe 50TB of tape storage spread over 120 tapes, and the single largest server having about 4TB of disc. Bacula is very, very good when it's properly configured.

    Disc backups I can't comment on usefully, as I'm firmly an old-style tape man myself. Since you specifically mention keeping history, I'd hope you were open to removable-media (ie, tape) backups.

    : Ah, unfortunately, i was not precise enough. We do disk backup. We just need history for a very short time. Just the time needed to figure out something had been wrong or disappeared. We would always restore the last version, there is no need for the application to restore a specific old backup.
    From MadHatter

0 comments:

Post a Comment