Subversion Repositories zfs_utils

Rev

Rev 4 | Blame | Compare with Previous | Last modification | View Log | Download | RSS feed

replicate is a script which is designed to manage replicating two sets of
ZFS snapshots. There are tons of them out there, and this is pretty
simplistic, especially as compared to some of the more throurough ones like
repl, etc... If you need more than just replication, look at those.

I wrote replication to handle a specific problem. I wanted separate scripts
for replication, pruning and snapshot creation so I could have greater
control over what happened on each machine. In one case, we had 'production'
backing up to 'backup' which was then backed up to 'airgap', each on a
different schedule, and with different retention policies.

replicate was designed to be a simple command to call that would do one
single dataset replication (recursive, if desired). As such, it can be
called from a crontab simply as

replicate source target # positional invocation
or
replicate -s source -t target # using flags

For more complex operations, the included sample script, sync, will 
read a configuration file and process each dataset in turn. See sync.md for
more information on that. It is mainly a "howto" to help you build your
own script.

replicate has two modes, a simplistic positional mode similar to cp
replicate source target

However, there are many flags that can be invoked to do additional things.
When flags and positional are combined, the flags will override the
positions, so
replicate -s source1 source2
will replicate from source1, not from source2

Flags
flags gan be passed as single characters (except bwlimit) and combined, thus
replicate -nrv
is the same as
replicate -n -r -v

--source or -s  
define the source, the thing to be replicated from

--target or -t
The target system, which will be syncronized from --source, only as limited
by --filter

--filter or -f
Filter (regex) to limit source snapshots to process. This is a Perl regular
expression, with all of the power that gives you. The default is "something
that looks like a datetimestamp, ie
(\d{4}.\d{2}.\d{2}.\d{2}.\d{2})
which is 5 fields containing 4, 2, 2, 2, 2 digits each, with any character
inbetween. This will match 2025/04/26_18:25. Unfortunately, it will also
match 2025104526218625. However, it will work for most applications.

--dryrun or -n
Only displays command(s) to be run. When in this mode, if pv is installed,
it will place a pv command (pv -petrs ###) between the source and target to
display an estimated completion if you run the command from the CLI

--recurse or -r
Process dataset and all child datasets. I thought about making this the
default, but decided not to. I assume this flag would be used more
consistently than any of the others. Without it, will will only replicate
the single dataset, ignoring all children

--verbose or -v
Increase verbosity of output. Without this flag, no output is produced
(except for --dryrun). Adding this flag once will return a brief, easily
parseable output saying how many bytes were in the source datasets snapshots
(and any children), so an approximation of how much data must be
transferred. A second field shows the number of seconds the replication
took.

-vv: using verbose twice will return all output from the local part of the 
replicate command (adding a -v to either send or receive)

--bwlimit
Limit the speed of the connect to # bytes/s. You can use the modifiers K, M,
G, T after the number. The multipliers are 1024 (si).

This is only available if the unix command pv is installed and accessible to
the script. It will be ignored otherwise. The --si flag is passed to pv
also. Note that this is bytes per second, not bits, so if you want to use
half of a 10Mb/s line, you'll have to divide by 8 (integers only, so either
1M, which would be 8Mb/s, or 2, resulting in 16Mb/s.