Coda File System

Re: dirty shutdown - what now?

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 20 Oct 2003 23:06:22 -0400
On Tue, Oct 21, 2003 at 08:44:33AM +0800, Mathias Koerber wrote:
> yesterday on one of my clients I had a venus crash and on restart I get:
> 
> Date: Tue 10/21/2003
> 
...
> 08:28:59 Getting Root Volume information...
> 08:28:59 Reintegrate SHARE pending tokens for uid = 0
> 08:28:59 fatal error -- volent::~volent: CML not empty
> 08:28:59 RecovTerminate: dirty shutdown (1 uncommitted transactions)

Well, for some reason the client tried to kill off a volume structure
that still had pending changes. So this is basically an assertion that
triggers because it was about to do something very stupid and this way
the user can at least recover the latest snapshot of the locally
modified data from /usr/coda/spool/.

The dirty shutdown message is because the assertion triggered while we
were modifying recoverable memory, and the operation in progress got
aborted (considering that it was about to do something really bad this
is actually a good thing).

> what to do in such a case?
> 
> a venus -init will lose all changes made on that client

A recent copy of modified files should be in a tarball in
/usr/coda/spool (/var/lib/coda/spool?).

> I tried repair, but it always claims the object I name is not
> in conflict and refuses to start.
> 
> I find the repair documentation very sketchy at best.

Repair is sketchy at best, although it is getting better.

> 1. after doing a 'cfs beginrepair' do I have to say beginrepair again in the
> repair tool?

You have to do 'cfs endrepair', because the repair tool doesn't see the
conflict when it is already expanded.

> 2. what to do if repair claims neither the directory nor the individual 
> files in them are in conflict? (but I know they are. In my case a
> renamed file did not get renamed on the server)

Depends on the situation. Essentially repair recognizes conflicts only
because they turn into a special type of symlink. If you have done 'cfs
beginrepair' the conflict-symlink isn't visible. If some application has
a reference to the object in conflict or something the tree below the
conflict, the kernel can't turn the directory-in-conflict into a
conflict-symlink, and we can't repair, etc. Then there are fundamental
differences between server-server (resolution conflicts) and
local-global (reintegration conflicts) in the Coda client code, which
make them do slightly different things which work or fail in various
situations. Almost every Coda release fixes another 2 or 3 problems in
repair and it's still not totally reliable.

Two of the major weak areas that I know of that still exist, conflicts
on objects that have active references, reintegration conflicts that
resulted from an already existing server-server conflict.

Jan
Received on 2003-10-20 23:07:54