Coda File System

Re: replication question

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 19 May 2003 10:19:52 -0400
On Sun, May 18, 2003 at 11:01:51PM -0700, Steve Simitzis wrote:
> i recently added a second server to my coda cell, and now i'm seeing
> a bunch of these in the SCM's SrvLog:
> 
> 22:55:05 Incomplete host set in COP2.

A few of these won't hurt, but in general this not good as it indicates
that a client only successfully committed the operation on one server.

> i'm not sure if this is related, but i changed my VSGDB file from this:
> 
> 145 /vice/db: sg4> more VSGDB
> E0000100 db
> 
> to this:
> 
> 145 /vice/db: sg4> more VSGDB
> E0000100 db sg4

Every existing singly replicated volume is now assumed to be doubly
replicated. It is often better to just add another entry.

    E0000100 db
    E0000101 db sg4

> 22:58:58 LockQueue Manager: found entry for volume 0x1000003
> 22:59:03 GetVolObj: Volume (1000003) already write locked
> 22:59:03 GrabFsObj, GetVolObj error Resource temporarily unavailable

That is interesting, maybe there is some problem with the client when we
have a replicated volume with only a single replica in a VSG with 2
servers. Perhaps sends the same request to the original server twice
which could explain something like this.

> should i be concerned?

Somewhat :)

Steve also wrote:
> 23:33:04 ComputeCompOps: fid(0x7f000002.1.1)
> 23:33:04 COP1Update: VSG not found!

That is really bad. Are the files in the /vice/db directories on both
servers identical?

This is could be a replicated volume created before you added the new
server, so it there really is no replica on the second server even
though your client believes it does. When you modify anything in the
/vice/db/servers or VSGDB files you have to restart all servers
otherwise they don't pick up the changes. And in this case you also need
to reinitialize the clients because they have cached seriously bad
state. The clients actually believe those volumes to be replicated
across both servers while they are are not.

Jan

ps. Sigh, again the servers in CVS don't use the VSGDB during normal
operation which removes a lot of these problem cases, the VSGDB file is
only used during volume creation.

It is starting to look like whatever is currently in 'CVS' magically
cures all ills. Well it did manage to get a lot of bugfixes and new
features, which is also the reason why it is taking a bit longer to get
it to compile and work correctly on all platforms. But it still does
have plenty of unresolved problems we all know and hate.
Received on 2003-05-19 10:22:01