Coda File System

Re: spontaneous local/global conflict, and how things got worse

From: Jan Harkes <>
Date: Wed, 27 Aug 2003 09:21:47 -0400
On Tue, Aug 26, 2003 at 06:20:48PM +0200, Matthias Drochner wrote:
> said:
> > the address you are sending from is not subscribed, so I've been
> > forwarding your emails to the list
> Thanks -- didn't know that, sorry for the trouble. Just subscribed.

No problem, I just noticed that my mail client didn't automatically put
you on the CC list, so you could easily miss replies. (and it added a
significant delay to your responses).

> So I've decided to use 6.0.2 (as written in the "compiler bug" mail).
> It works _much_ better. "cvs update" worked well.
> There are still some glitches, but it looks manageable.

Working on the glitches. What is now in CVS fixes another batch of small
repair and resolution problems, some of which got introduced by the
whole 'realms' thing, so those are 6.0 specific bugs. In some ways the
6.0 clients and servers are better than before, but we do seem to have
some regression in some areas which make things that worked perfectly
before are a bit worse.

Part of this might be related to the lookaside checksums that the server
has to calculate, which are slowing down the getattr responses.

> > > cvs update: cannot rewrite CVS/Entries.Backup: Interrupted system call
> > That's as far as I remember a weirdness in the BSD kernel modules. For
> > some reason it automatically aborts an operation if venus doesn't send a
> > reply within some time limit (60 seconds?).
> As I understand the kernel code, it waits 128 times 2 seconds, where
> each wait can be interrupted by a signal. So without signals occuring,
> the timeout is almost 5 minutes. On the other hand, with many signals
> occuring, the time gets shorter.
> Once I really understand what this is supposed to do, I might be able
> to do something about it.
> Btw, I can easily trigger this by running "bonnie" in the /coda tree
> (on 6.0.2).

Hmm, is bonnie using SIGALRM by any chance?

> Another problem I've just found is related to file lookups.
> (tried to compile a kernel in the CVS tree on /coda)
> $ ls -l ../../../../compat/linux/arch/i386/../../common/linux_pipe.c
> ls: ../../../../compat/linux/arch/i386/../../common/linux_pipe.c: No such file 
> or directory
> $ ls -l ../../../../compat/linux/common/linux_pipe.c
> -rw-r--r--  1 drochner  65534  3168 Jan 20  2003 ../../../../compat/linux/common/linux_pipe.c
> It doesn't happen with trivial test cases with 2 or 3 directory levels.

I can't test that on my linux machines because they have a directory
lookup cache which short-circuits all the ../ lookups. But if there is a
problem it should affect all NetBSD/FreeBSD/Solaris and Windows

Are you by any chance crossing volume boundaries in the problematic
cases, while the trivial tests are within the same volume? It looks like
that isn't the case, but just to be sure because volume mountpoints are
handled a bit differently compared to normal directory traversals.

I'm wondering if it could be related to not having linux/arch cached.
What happens if you first do,
    ls .../compat/linux 
    ls .../compat/linux/arch

before doing,
    ls .../compat/linux/../../common/linux_pipe.c

Received on 2003-08-27 09:23:50