Coda File System

Re: Problems with replication on two servers.

From: <>
Date: Thu, 23 Apr 2009 13:12:09 +0200
Hi Marc,

On Thu, Apr 23, 2009 at 11:42:22AM +0200, Marc SCHLINGER wrote:
> It becomes complicated when on the scm, I block all traffic using iptables.
> I see the client starting sending messages to the replica(via tcpdump). 
> But when I unblock the traffic on the scm I always get the same error.
> On the scm:
> 18:25:10 GetVolObj: Volume (1000002) already write locked
> 18:25:10 RS_LockAndFetch: Error 11 during GetVolObj for 1000002.1.1
> 18:25:46 LockQueue Manager: found entry for volume 0x1000002

There are certainly some locking issues hiding there.
I have been hit by "Volume (XXXXXXX) already write locked" as well.

This problem stems quite certainly from one of the original assumptions
of Coda design - the servers are treated as well-connected to each other,
in contrast to the clients which may have unreliable connections.

> On the client I got a dangling symlink for volume test.
> My question is: Isn't coda fail tolerant? Or do I miss something in my 
> installation/configuration ?

No, I don't think you do.

Coda is quite fault tolerant, it copes pretty well with
- clients losing connection to the net
- a server going down once in a while

It does not cope well with servers intermittently losing contact
with each other.

I guess this would be relatively hard to fix, given the original
assumption named above. AFAIK there are no current plans to.

It is nice that you are consequently testing Coda, this might certainly
help to discover some hiding bugs and possibly even convince the developers
about the server-side fault tolerance.

There are certainly many potential users which would appreciate
weakly connected servers being supported, but this may present
some fundamental problems besides the implementation ones.

On the other side Coda is very useful as it is and there are also issues
of more immediate interest to fix.
The developers' resources are limited, so your best bet would be to join
the development. Unfortunately the "entry threshold" is quite high
because of the code being complex and still reflecting the years of
research-oriented programming.

Received on 2009-04-23 07:31:23