Coda File System

Re: crash in rvmlib_free (not necessarily) during repair

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 10 Jul 2013 07:36:31 -0400
On Wed, Jul 10, 2013 at 11:53:31AM +0200, Piotr Isajew wrote:
> I can mount the volume:
> $ cfs mkm t test
> $ cfs lv t
>   Status of volume 7f000003 (2130706435) named "test"
>   Volume type is ReadWrite
>   Connection State is Reachable
>   Reintegration age: 0 sec, time 15.000 sec
>   Minimum quota is 0, maximum quota is unlimited
>   Current blocks used are 6
>   The partition has 8568880 blocks available out of 11755920
> 
> but accessing it gives the followin error on secondary:
> 
> 11:37:26 AuthLWP-2 received new connection 1894697031 from 192.168.9.103:58911
> 11:37:26 RS_LockAndFetch: Couldnt translate Vid for 7f000003.1.1
> 11:37:26 RS_LockAndFetch: Couldnt translate Vid for 7f000003.1.1
> 11:37:26 GetVolObj: VGetVolume(7f000003) error 603
> 11:37:26 GrabFsObj, GetVolObj error Volume not online
> 11:37:26 RS_LockAndFetch: Couldnt translate Vid for 7f000003.1.1
> 11:37:26 RS_LockAndFetch: Couldnt translate Vid for 7f000003.1.1
> 11:37:26 Failed to translate VSG but PrimaryHost != 0

One of your servers probably has an out-of-date VRDB file, which is used
to map from the replicated volume identifiers to the actual volumes.

This file is replicated between servers by the updatesrv/updateclnt
daemons, when the files in /vice/db have been resynchronized by the
updateclient the server is told to reload the volume information with a
volutil RPC2.

Updated user/group data is automatically picked up as soon as the new
file is available, and other things such as the 'servers' file are only
reloaded when servers are restarted.

That last one can cause some headaches. If you add a server, or a server
changes IP-address such changes are not picked up until all servers have
been restarted. In fact, in some cases ip addresses are stored in RVM
and used to identify which operations have only been applied locally, or
which volume replica is supposed to be stored on which server, etc.

Also resolution is a somewhat heavyweight 6-phase process where
resolution logs and directories are shipped back and forth and compared.
Coda's replication was not designed to handle the servers in multiple
buildings/countries scenario, but to handle service outages due to
failing hardware or maintenance.

Jan
Received on 2013-07-10 07:37:57