Coda File System

Re: coda server crash

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Sun, 18 May 2003 13:46:45 -0400
On Sun, May 18, 2003 at 03:53:35PM +0200, Lionix wrote:
> At this level of technical rvm problem i would not be of any help i 
> think.....
> Anyway I gonna write a response until Jan is back ! :-)

I never left :) I actually mailed back and forth with Steve a couple of
times, but he had already started to rebuild his servers from backups.
Haven't heard any further updates, so I hope it worked.

> During my test on coda-fs ( not finished at all ), at a time i losed my 
> SCM :  It was complaining about rvm : "No RVM type selected"....  I 

That happens when the server is unable to read the RVM parameters
(specifically rvm_data_length) from /etc/coda/server.conf.

> - As i did not configure the root-volume replicated....

My terminology of replicated vs. non-replicated is a bit different
probably.

- Non-replicated volumes in Coda only exist in the VLDB, this type of
  volume is pretty much directly inherited from AFS2 and as such doesn't
  have a lot of the Coda specific features (most importantly version
  vectors) and can't be used when we are disconnected. I've made it
  pretty difficult to create these, the 'createvol' script isn't
  available anymore, so only some volumes created with volutil can
  actually be non-replicated (backup, clone, and volutil create).

- Replicated volumes have an entry in the VRDB (volume replication
  database) which 'maps' to one or more volumes in the VLDB. These
  volumes have version vectors and can be used while disconnected. Any
  replicated volume is created with the createvol_rep script, or the
  hard way with a lot of volutil commands. Because a 'replicated' volume
  can map to a single volume replica, I tend to use the weird term
  'singly-replicated' for these.

> As you are trying to get root-volume back too, if you don't have 
> replication or a backup , you're on a bad run.... Let's hope Jan or 
> someone else have a magical high technical solution !

Not really, except I believe I got that NULL-pointer dereferences after
reenabling resolution fixed in the CVS version. I still don't know how
his server could have gotten into trouble with the resolution log while
it was disabled.

> - I read on some docs that write-replicate the root-volume was dangerous 
> because a conflict on this volume would freeze the whole cluster.... ( 
> got to try  this too to see....) !

When the root directory of the root volume becomes a conflict, /coda is
turned into a symlink. However, this object is not really under our
control. So depending on the kernel we either have a directory that is
inaccessible, or it actually becomes a symlink. In both cases we cannot
reach the magic '/coda/.CONTROL' file that is used as the special object
by cfs and repair to sends commands to venus (like 'cfs beginrepair').

As a result it becomes impossible to repair the conflict. The only way
out is to create a temporary volume (i.e. tmproot) and start venus with
'-rootvolume tmproot', mount the problematic rootvolume and repair the
conflict. Again CVS fixes this because the top level (readonly)
directory contains locally generated 'cell names' on which the various
rootvolumes are mounted.

> I suppose that this is out of date because I read a mail from Jan  that 
> tell us that the root volume is 3-replicated at cms...

It has always been triply replicated. Every subdirectory is an
independent volume and we hardly ever create anything in /coda

> So i choose the " root-volume security policy " :
> - Full-Replicate the root volume and do less operation as possible on it 
> : no user's operation !!!
> only coda configuration stuff !!!!
> ( you can't put administrator's work in an Admin volume  too !!!!!  )

That's how we do it, /coda only contains a handful of mountpoints (usr,
project, backups, tmp).

> Hope this considerations would help for the future....
> Hope somebody could help more...
> Hope i did not write to many stupid things....

Nope, it sounds about right.

Jan
Received on 2003-05-18 13:49:01