Coda File System

Re: server failure simulation

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 8 Jun 2007 17:56:15 -0400
On Thu, Jun 07, 2007 at 12:37:57PM +0200, Jakob Praher wrote:
> i successfully copied data to the coda mount point and saw it getting 
> replicated on both vicepas.
> 
> now before using it for something productively i thought about testing a 
> failure sceanrio. so i stopped the replication server. indeed i could 
> use the mount file.
> 
> so i tried to copy data onto the mounted coda partition. IMHO i thought 
> that it gets copied to the online server and later replicated to the 
> server when it is online again. but nothing happened. the client stuck 
> with the cp command and the vicepa of the online server did not change 
> at all (using df-h).

A client doesn't immediately give up on a server that has disappeared,
it should take about 60 seconds to time out, but because of a missing
division in librpc2 it actually takes closer to 90 seconds at the moment.

If you run 'codacon' in a separate terminal window you can see events
like server disconnections and reconnections.

> so i canceld the command and tried to put the second server online 
> again. that worked. but after refreshing the client services I got the 
> following log messages:

Refreshing the client? This log looks more like you shut down and
restarted the client.

> 12:28:23 starting VDB scan
> 12:28:23 Fatal Signal (11); pid 3301 becoming a zombie...
> 12:28:23 You may use gdb to attach to 3301
> 

This is a segfault very early on, we haven't even started looking at
the cached files. I don't understand how this is related, a segfault in
this place is definitely unexpected. How did you restart the client,
init script, vutil -shutdown, or a kill -9?

What distribution and c-compiler are you using, this looks like it
happens the first time we iterate a c++ list structure after a restart
which could indicate that the offsetoff macro may be faulty, but then I
would have expected that that would have also affected venus while it
was running. Is your machine 32-bit or 64-bit?

> As a second thing: in my scenario i do not really need acls, since i am 
> only interested in using coda for shared data and i am having firewalls 
>  such that only privileged hosts can access the file servers. can i 
> somehow configure the volumes such that everbody can access them without 
> the need to do clog? are there some good pointers i can rtfm for this case.

There is at the moment no way to avoid clog if you want write access
because the servers only accept updates over authenticated connections.

Jan
Received on 2007-06-08 17:57:21