RSBackup is a package I designed for a special purpose backup system. The design goals were: * All communication is initiated by the client machine * Secure and efficient transport across untrusted networks * Detailed reporting * Security of the backup itself * Multiple versions of security * Ability to have mirrored backup The primary difference between rs and other backup systems is that the server has no way to contact the client machine. There are already many very good backup packages where the server "pulls" from the client, but the niche for this package is when that is not feasible, generally from a security point of view. In my case, the company I work for (Daily Data, Inc., http://www.dailydata.net) provides remote backup for clients whose data is stored behind one or more firewalls. I briefly considered vpn and ssh connections from the server to the firewall, but that added a level of complexity that was not necessary. For the "Secure and efficient transport across untrusted networks", I built the system on the efficient and secure rsync over ssh. rsync's extremely efficient transport generally resulting in a savings greater than 50% on file transfers. Thus, a gigabyte of files being copied generally results in less than 500M of actual network traffic. Compression, intelligent transfer of only the changes in a file, ability to resume a partial transfer, and various levels of validation of correct transfer are built in. Much of the data backed up is sensitive. As such, an encrypted connection is a requirement for transfer. The server itself must be secured also, with limited access (physical and network), additional checks and reporting of all access, and encryption of the data itself to reduce the chances of data loss should physical security be breached. Security also comes from keeping multiple backup sessions in an efficient manner on the server, with automated "roll off" of older versions based upon a robust rules system. Finally, the system should have a high availability. While much of the high availability is based upon physical setup (RAID, Virtualization with automated failover), the ability to rapidly sync the backup server itself with a remote mirror is necessary. RSBackup is documented at [http://wiki.linuxservertech.com http://wiki.linuxservertech.com]. The easiest way to get a copy for Debian based systems is to add the following line to your repositories (/etc/apt/sources.list): deb http://debian.dailydata.net/debian_repository / The program consist of Perl and bash scripts, so it is easily viewed and modified. NOTE: I am a command line type of person, so I have not (yet) built a GUI for configuration, though the sysinfo application (from the same repository) does have the ability to report on backups through a web interface (primitive at the moment). However, at this stage of the game, be prepared to open up your favorite command line editor and get your fingers dirty. == Overview == The order of work is rather simple. # client fires off rsbackup, generally based on a cron job # rsbackup performs any initialization code that may be defined # rsbackup sends a "prepare" command to the server # rsbackup initiates rsync transaction # rsbackup sends a "complete" command to the server # rsbackup performs any cleanup code defined # error report sent to designated e-mail account # job report sent to designated e-mail account (can be different from 7). Steps 3 through 5 require co-operation from the server. In each step, the server inspects the incoming package to verify it is coming from an authorized machine, from an authorized IP address, at an authorized time range, and checks the command to verify the client is authorized to send that command. If any of these points fail, server refuses to accept communication and sends an e-mail alert to the administrator. Step 2 allows the client to perform actions before the actual backup begins. This can be performing hot database backups, performing "pull" backups from accessible internal machines, or anything else that might be required. Step 3 allows the server to perform actions prior to the backup. These may be creating a copy of the existing data, verifying the client has sufficient space to add data, etc... Step 5 allows the server to perform post backup actions. Examples might be checking disk space used, removing old versions of the data, or even firing off a "sync" command to a remote mirror. Step 6 allows the client to perform any additional actions it needs before shutting down the backup. In some cases, unmounting remote drives, cleaning up local versions, or other actions may be required. All output from steps 2-6 are stored by rsbackup, and this information may be sent to a designated e-mail account. == Problems == * rsbackup is simply a wrapper script around rsync, and suffers from some of its failings. * rsync can not tell if a file or directory has simply been renamed. In this case, rsync can not know if a file is simply renamed, so it deletes the original and copies the new one over as a new file. If a directory containing a large number of files and/or subdirectories is renamed, it results in a very, very large transfer as the new directory tree is created from scratch and the old directory then deleted. Note that this can also greatly increase disk space usage over the short term as, by default, rsync will complete all updates before it deletes files, so at some point in the process you could have two copies allocating disk space. * rsync, in its default running, will not copy hard links correctly, allocating space for each entry. Thus, file's 'a' and 'b' which are hard links to each other will only allocate the space for one copy on the original server, but will allocate two separate copies on the remote machine. If you have many hard links on your source, consider adding the --hard-links flag to the rsbackup client configuration. * rsync is designed to be very efficient about updating files with only modifications. The algorithms used are intricate, but suffice it to say that if you make a modification in a file, most likely rsync will only transmit the command necessary to update the target with the changes. This can be a great savings on a 75Meg 3D drawing file that has had a few simple changes made. However, compression negates this ability as compression can (almost always does) modify the entirety of the file. Thus, saving a database dump, then compressing it, will result in the entire file being transferred on the next backup run if anything has changed at all (such as the date in the database dump header). In this case, it may be better to leave files uncompressed for increased efficiency of transfer, at the expense of disk space.