Hi,
Which backup tool or solution would you use to backup terabytes and lots of files on a production linux server ?
Note that the files are all different and almost never modified, and usage is mostly adding files, so data volume is today 3TB growing all the time at around +15GB/day.
Please do not reply rsync. Basic unix tools are not enough, rsync does not keep history, rdiff-backup miserably fails from time to time and screw the history. Moreover these are all file based backup, which put a lot of IOwait just to browse directories and query stat(). But i guess, except R1Soft CDP, there is no way around that.
We tried R1Soft CDP backup, which is block level backup, and it proved good and efficient for all our other servers, but systematically fails on the server with 3 terabytes and gazillions of files. That is already more than 2 months that the engineers of R1Soft and datacenter are playing a hot ball game... and still no backup except regular rsync
We never tried big commercial solutions, except R1Soft CDP since it was provided as an optional service by the datacented hosting our servers.
-
Try BackupPC. For me it works very well with couple of terabytes of data and tens of millions of files (some 100 000 - 500 000 of those changing daily). OK, BackupPC does use rsync and is file based, so that might be a show-stopper for you.
Bacula is another popular one, and it sure has the coolest slogan of them all. And it even does not use rsync! :-)
: We use BackupPC on our local intranet, mainly because these are desktop PC and the deduplication feature of BackupPC is really helpfulFrom Janne Pikkarainen -
EMC Networker has an option called SnapImage that should increase backup speed for your kind of data.
I have only heard about it, but I never tried, sorry...
From marcoc -
I tried many backup solution, started with rsync and rdiff-backup. Also pure tar-ing and bash scripts. But bacula beats them all. It is based on modular design, I have about 8 PCs in backup network and growing.
To anyone I recommended bacula, they were more than happy to finally their home.
From iElectric -
I think only solution for you is block-level backups
You may write scripts that uses LVM snapshots (or even lower level dm-snapshots) and transfer them to storage serverYou also may take a look into Zumastor project and their ddsnap utility
PS. Solaris/FreeBSD servers have ZFS that can automate this process by using incremental snapshots + ZFS send/recive
From SaveTheRbtz -
rsnapshot
or, if you want more control; just hack up a short bash script to do the same thing: one
cp -al
, a fewmv
andrsync
.i use it on a very busy 30TB server with around 5million files, and works wonderfully.
From Javier -
Try using mirrordir. With an appropriate script, it seems to be the ideal solution for you. It only updates the files which have changed, (modified, created, or deleted,) but also has the capability to preserve old files. I'm not sure how that function works, but it shouldn't be hard. Here's the script I use: (Edited somewhat for clarity. Hope I didn't cause problems with the edits)
#! /bin/bash logfile="/home/share/Backup-log.txt" echo "" | unix2dos >> $logfile echo `date`" /bin/mirror_backup started" | unix2dos >> $logfile echo "" echo "" echo "mirror_backup Automatically archive a list of" echo " directories to a storage location" # Mount mirror drive mount -o remount,rw /mirror xstatus=$? if [ $xstatus -ne 0 ] then mount -o remount,rw /mirror 2>&1 | unix2dos >> $logfile echo `date`" Mount failed, aborting /bin/mirror_backup..." 1>&2 echo `date`" Mount failed, aborting /bin/mirror_backup..." | unix2dos >> $logfile mount -o remount,ro /mirror 2>> /dev/null exit $xstatus fi # Define Source Directories sourcelist="/home /etc /root" dest="/mirror" for dir in $sourcelist do if [ ! -d ${dest}${dir} ] then mkdir -p ${dest}${dir} 2>&1 | unix2dos >> $logfile # chown mirror:mirror ${dest}${dir} fi done # Mirror directories for dir in $sourcelist do # Delete old files echo "" echo "Deleting old files in "${dest}${dir} mirrordir --nice 0 --exclude-from /root/exclude-list --only-delete ${dir} ${dest}${dir} 2>> /dev/null # Run full mirror echo "Mirroring "${dir}" to "${dest}${dir} mirrordir --nice 0 --restore-access --access-times --exclude-from /root/exclude-list ${dir} ${dest}${dir} 2>&1 | unix2dos >> $logfile done # Perform miscellaneous tasks report="/home/share/disk-report.txt" echo "Report generated on "`date` | unix2dos > $report echo "" | unix2dos >> $report echo "RAID drive status:" | unix2dos >> $report cat /proc/mdstat | unix2dos >> $report echo "" | unix2dos >> $report echo "Disk usage per slice:" | unix2dos >> $report df -h | unix2dos >> $report echo "" | unix2dos >> $report echo "Disk Usage per User:" | unix2dos >> $report du -h --max-depth 1 /home | unix2dos >> $report echo "" | unix2dos >> $report echo "Disk Usage on Share drive:" | unix2dos >> $report du -h --max-depth 1 /home/share | unix2dos >> $report echo "" | unix2dos >> $report echo "Filesystem Usage Overview:" | unix2dos >> $report du -h --max-depth 1 / | unix2dos >> $report echo "" | unix2dos >> $report echo "Report Complete" | unix2dos >> $report echo "" echo "mirror_backup complete." # Unmount Mirror Drive mount -o remount,ro /mirror 2>&1 | unix2dos 2>> $logfile echo `date`" /bin/mirror_backup completed successfully" | unix2dos >> $logfile exit 0
With no changes to commit (second run-through, for example) it takes about 5-7 minutes to scan 1.5 TB of files. Of course, it's a lot slower on the first run-through.
By the way, this script was written by me for my use on my personal server at home. While anyone is absolutely free to use or modify it for themselves, I am making absolutely no guarantees or warranties. It's free, so you get what you pay for. Hope it helps, though!
: I currently use rdiff-backup, which looks like an equivalent of mirrordir. I should try it, as rdiff-backup is not very robust and frequently corrupts its own history.Jesse : I've been using this setup for about 4 years now, and I've never had problems with it. I just use cron to run it every night, and it does quite well. I've had to recover with these backups several times (drive failures, dumb mistakes on my part, you name it...) and other than moving the respective RAIDs in fstab and moving a few directories, it is a hassle-free recovery. Note that I just press my backup array into service as the primary. When I get the replacement hard drive, I rebuild its array and make it the backup. Not the slickest, but effective anyway.From Jesse -
You don't say what you want to back it up to; tape or disc? Assuming the former, then I endorse the recommendations for bacula. I use it at several different sites, at one of which I have it driving a 60-slot two-drive LTO2 robot, with a total of maybe 50TB of tape storage spread over 120 tapes, and the single largest server having about 4TB of disc. Bacula is very, very good when it's properly configured.
Disc backups I can't comment on usefully, as I'm firmly an old-style tape man myself. Since you specifically mention keeping history, I'd hope you were open to removable-media (ie, tape) backups.
: Ah, unfortunately, i was not precise enough. We do disk backup. We just need history for a very short time. Just the time needed to figure out something had been wrong or disappeared. We would always restore the last version, there is no need for the application to restore a specific old backup.From MadHatter
0 comments:
Post a Comment