Coda File System

Re: new coda issue: touch a file and coda dies

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 7 Jul 2005 16:03:57 -0400
On Thu, Jul 07, 2005 at 01:47:09PM -0600, Patrick Walsh wrote:
> 	And I've hit upon what I think must be the problem:
> 
> # cfs whereis /coda/director/snapin/pool_scm
>   dir224  dir225  dir225
> 
> 	A quick look at VRList on the server shows:
> 
> /snapin 7f000003 3 1000004 2000004 200000a 0 0 0 0 0 0

Yes, that would be a problem. When the client sends a multirpc call to
the servers it doesn't contain the replica-id, but the replicated volume
id (i.e. 7f000003). The server then internally maps this to the local
replica id by iterating the list of replicas until it finds a volume
that has the correct server identifier (the first 8 bits of the volumeid).

So it finds replica id 02000004 and performs the operation. In this case
it will actually receive 2 of the 3 MultiRPC calls, but performs both
operations on only the first replica it finds.

So a create should fail with an EEXISTS, and similar strange errors.

> 	So it appears we are triply replicating a volume to two servers.  I
> have no idea how this happened -- we've automated the setup of coda and
> that code hasn't changed for some time.  So I'll look into this and try

Maybe 2 servers have the same server-id, or you initially created a
doubly replicated volume and then added a replica where the new replica
is on one of the existing ones. (or it was as simple as
"createvol_rep volume server_a server_b server_b")

> to figure out what's going on.  Sorry to waste your time with a bad
> setup.  I just can't figure out how it got setup wrong.

I can't figure out how it even got this far in the copy. I guess the
client might have disconnected from the 2 replicas on the same server,
committed on the remaining server and used resolution to propagate the
updates.

Jan
Received on 2005-07-07 16:04:51