Coda File System

Re: Adding a replicating server

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 21 Apr 2005 14:29:43 -0400
On Wed, Apr 20, 2005 at 05:35:59PM -0600, Patrick Walsh wrote:
> # volutil info /.0
> Recoverable volume log
> version: 1 malloced
> ...

    # volutil info vmm:root
    V_BindToServer: binding to host verdi
    Recoverable volume log 
    version: 1 malloced
    adm_limit 4096 size 32 used 2
    rec_max_seqno 1400 current_seq_no 1400
    index contents

I'm wondering if the resolution log is enabled on your 1000001 replica.
Normally we turn it off for singly replicated volumes. However this will
be a problem once you hit a conflict now that there are multiple
replicas.

Essentially this log contains all operations that were applied to a
server, but not yet confirmed as completed everywhere by a phase2 commit
message (COP2). Normally a client will send an operation to all servers
and collect the responses, on the next operation it piggybacks the COP2
assocated the previous operation. If there isn't a next operation within
a certain time window it flushes any pending COP2's with a separate RPC
call.

However, if the client disconnects before the COP2 is sent, the logged
operation will not get cancelled. Another place where this happens is
when a client is performing weak reintegration where it sends updates to
only a single replica and then triggers resolution to propagate the
changes to the other replicas.

However since there is only a single replica a client never notices that
there might be uncomfirmed operations, we normally detect them by
looking at version vector differences between replicas. As a result we
never force the replica to 'resolve' and the log keeps slowly growing
until it fills up and the server dies, search for AllocViaWrapAround in
the coda mailinglist archives.

Since 5.3.17 we turn the resolution log off when we create a replicated
volume with only a single replica, this way we never add new entries to
the log and avoid the crash.

I seem to recall that the last time someone tried to reenable the
resolution log the server didn't really like it... Just tried it, I
created a single replica, filled it with data, turned resolution back
on, created a second replica, extended the volume in the VRDB.

That all seemed to work.

However my client doesn't seem to pick up the volume change just yet. I
already tried
    'cfs checkvolumes'
    'cfs disconnect; cfs cs; cfs reconnect; cfs cs'

Ok, looking at the source, clearly this will never work since it only
picks up changes when the volume name <> volume id mapping changes.
However the create does look smart enough to simply update an existing
volume, I'll see if I can get it to follow that code-path.

The only solution right now is to either flush every object that exists
in the resized volume which purges the stale volume information from the
client cache, or to reinitialize your clients. So for me with my single
testvolume the following worked,

    $ cd .. # making sure nothing has a reference to any object in the volume.
    $ cfs fl testvolume
    $ cfs whereis testvolume
      VIVALDI.CODA.CS.CMU.EDU  MAHLER.CODA.CS.CMU.EDU

And now I see Resolve messages showing up in codacon when I run 'ls'.

Jan
Received on 2005-04-21 14:31:11