Coda File System

From: Peter J. Braam <braam_at_cs.cmu.edu> Date: Sat, 4 Apr 1998 23:41:21 -0500 (EST)

Hi Eli,

We are getting to the heart of the matter now. Is your backup program
running? 

On Sat, 4 Apr 1998, the prophet wrote:

> 
> My current understanding of the situation is as follows.  Please let me
> know if this is correct.
> 
> Currently, the backup program only makes incremental dumps with respect to
> the previous full dump.  So, suppose a particular volume is set to do full
> dumps on Mondays and incremental dumps on other days.  Then, at the end of
> the week, you would have a single full dump (Monday) and a bunch of
> level-1 incremental dumps, each reflecting all changes since Monday.
> 
> What we would like is for the backup program to make incremental dumps
> each day which reflect the changes of the last successful dump (full *or*
> incremental).  

To be more precise, we would want the backup program to make level N 
dumps of a voluume.  A level N dump is the increment against the
latest dump of a level strictly lower than N. 

Then, if we want to get a full dump representing the state
> of, say, Wednesday, we would merge Tuesday's incremental dump onto
> Monday's full dump and Wednesday's incremental dump onto the result.  To 
> do this, we would still only need to use a single "ancient" file listing
> the vnodes the the last full dump.

The ancient file is not used or needed during merging.  We need it during
dump. 

I believe we need more than one ancient file, namely one for every level. 
Say a sequence of dumps has taken place of levels N1 N2 N3 N4 N5.  Now a
dump of level N6 comes along.  It will look and find the latest in the
sequence smaller than N6.  Depending on the value of N6 this might
potentially be anything: e.g.

1 2 3 5  requires ancients for 3
1 2 3 3  requires ancients for 2
1 2 4 2  requires ancients for 1
etc.

(an interesting point is that __if__ the Coda vnodes stored the latest
time they were modified then we would not neede ancient files at all, we
could just comparethe mtime of the vnode with the time of the previous
clones.  However, I have never looked at all the details and things like
reintegration and resolution can actually store vnodes that "look old",
e.g. the get the mtimes of the modifications made to the vnode during
disconnected operation not the time that they were stored on the server.
I think it would be very sensible for Coda to have "server" vnode mtime
which is the time of actual installation of the vnode in RVM and I am
planning to build that animal at some point.) 

 > 
> Here are my current thoughts based on this understanding: this approach is
> relatively inflexible. 

What flexibility would you want to add? I am not following yet.

Do we want to let the user specify in the dumplist
> the level of incremental dump that should be done each day that an
> incremental is specified? 

Yes, and a script can help with this.  For example many companies would
run a full dump on the 1st of the month, level 1 dumps on the Friday night
and a hanoi tower during the week. (See man dump).   For a full dump they
would stay overnight. 

BTW I think it would be neat to work out some optimizations with tape
useage for these things.  Mostly, full dumps are kept in perpetuity, I
have no idea why, then incrementals are recycled etc.  The problem with
the current coda situation is that we have too many tapes that contain a
full dump.  What is the optimal distribution of nights/levels given the
boundary that:

1) you tape machine can hold x% of your file space
2) you don't want to ever merge more than L dumps (i.e. L is the highest
level)

Probably here the Hanoi towers come back.

(A level n incremental dump would be with
> respect to the most recent level n-1 dump.) 

This is not necessarily true.  It would be with respect to the latest dump
of level smaller than n, this could be 0 through n-1.  The point is that a
dump of level 0 is guaranteed to exist, but no other one is. 

> 
> If we do this, (which it seems we should), then do we need to introduce
> the notion of the "level" of a full dump,

The level of a full dump is always 0 and always a volume is dumped at
level 0 upon creation. 

 which corresponds to the highest
> level of incremental dump merged in its creation?  Then, a level n
> incremental dump could only be merged onto a level n-1 full dump.  Or is
> there another way to manage this?

I hope that what I wrote is clarifying things for you.  It might be
helpfull to read the dump manpages very carefully, they have a lot of
useful information.

> 
> -Eli Daniel
> 
>       0
>         0
> \  /\ 0
>  \/ o\ 0 
>  /\  /
> /  \/
> 

Coda File System

Re: multilevel backups