Coda File System

Re: spontaneous local/global conflict, and how things got worse

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 26 Aug 2003 11:04:27 -0400
On Tue, Aug 26, 2003 at 06:40:36AM -0400, Matthias Drochner wrote:
> blymn_at_baesystems.com.au said:
> > One thing, do both you have your connection set to strong? i.e. do a
> > "cfs strong" before you try the check out. 
> 
> I've tried it and got different problems. cvs complained
> cvs update: cannot rewrite CVS/Entries.Backup: Interrupted system call
> a number of times, and after a while there was no visible progress
> anymore.

That's as far as I remember a weirdness in the BSD kernel modules. For
some reason it automatically aborts an operation if venus doesn't send a
reply within some time limit (60 seconds?). The Linux code works
differently, it ignores all signals except for SIGINT and SIGKILL during
the first 30 seconds and after that any signal can abort the operation.
There is one exception, the close upcalls, because if that upcall is
aborted before venus sees it, the reference count is off and we get
objects locked down in the cache that can never be refetched or removed.

> The venus console messages were like
> 
> 11:14:32 Volume coda.root busy, waiting

Hmm, this is common during resolution when the server cannot get a
write-lock on the volume and the client will retry the operation a
couple of times. There is a deadlock detection thread, but I don't think
it breaks any resolution related locks. A server restart should clear
any dangling locks.

Jan
Received on 2003-08-26 11:05:35