Coda File System

Re: Linux 2.6.32 seems to exaggerate the race bug(s) with Coda

From: <>
Date: Wed, 13 Oct 2010 09:31:11 +0200
Hello Jan,

On Fri, Oct 08, 2010 at 01:34:51AM -0400, Jan Harkes wrote:
> What I did see was the following commit [1] which I believe may fix the
> problem either way, as it removes/replaces the test where ENOENT is
> returned when revalidation fails.
> It looks like that patch went into 2.6.36-rc2.

For about a week I could not observe the problem with 2.6.35
despite stressing hard the scripts which are known to trigger the bug
(this is actually a way better than the behaviour with 2.6.32).
But now I saw at least one occasion when the scripts failed,
which suggests that the bug is still there at 2.6.35.

Looking forward to test 2.6.36! The bug is a real PITA when you happen
to hit it.

> I have not reliably reproduced your problem and for some reason am
> unable to reproduce on any machine with >1GB of main memory. It may be

Hmm. This really looks like a(n in memory) caching issue.

> unusable for you, but it is a quite hard to trigger race condition that
> doesn't affect most people. It seems to require storing binaries and
> shared libraries in Coda which are accessed through a recursive and deep
> symlink forest.

I see. Actually I tried to arrange simpler test cases before, with fewer
binaries/libraries on Coda and sometimes I could trigger the bug but
it was really hard/unreliable to reproduce. Now I am just using
my everyday production scripts - which rely on forests of files and symlinks.
This does not help the analysis but at least gives some measure
of stability/brokenness with different kernels.

> Other file system developers have occasionally hit on the same problem
> [2][3], Nick's patch seems to be the first one that has actually been
> accepted.
> [1];a=commit;h=2e2e88ea8c3bd9e1bd6e42faf047a4ac3fbb3b2f
> [2]
> [3]

Thanks Jan!

Received on 2010-10-13 03:48:23