Coda File System

Re: odd repair problem

From: Greg Troxel <gdt_at_ir.bbn.com>
Date: Wed, 15 Aug 2001 10:26:34 -0400
I'm really sure there is only one replica because I have only 1 server
and have never set up a replicated volume.

  Ok, there are several known problems here. One is that there is a
  server-server conflict. The resolution code doesn't know how to handle
  cross-directory renames and always marks all involved objects
  'in-conflict'.

That seems most unfortunate.  Any idea on how hard this is to fix?
This means if I do such a rename while disconnected, I'm going to end
up in a wedged state every time, I think.  Is this what you mean?

  Second problem is that due to the low bandwidth connection, your client
  did a 'weak-reintegration'. This involves sending the updates to one
  server, and then triggering a resolve to update the others. Normally, a
  simple optimization cuts out the resolve RPC call but maybe they are
  occasionally triggered for unknown reasons. This should be harmless
  except for the fact that the rename resolution code dumbly marks all
  related objects as a conflict, although there are no other replica's to
  conflict with.

Perhaps an optimization could be added to only do a weak-reintegration
only if the volume actually has multiple replicas.  Or do you mean
that this is the intent but it doesn't work?

  Third problem, the server-server conflict blocks reintegration of
  subsequent operations and triggers a local-global conflict. This
  local-global conflict in not repairable until the server-server conflict
  is resolved from another client.

Does this persist even if there is no s-s conflict?  The directory
looks fine from other venii, with no hints of a conflict.

It is seeming to be that the earlier CML entries (before the renames)
hould be reintegratable, but I admit to not really understanding how
this works.

  Ehh, that is funny, this patch doesn't change anything to RPC2, which is
  where the timeouts occur. The only thing this patch does is clamp the

It forces the bw estimate to something that I know is sustainable,
which prevents timeouts.  If the bw estimate is allowed to follow the
algorithm, it grows to be larger than the actual modem speed and then
too much data is sent and then I get a timeout.  I just stuck this in
as a quick hack one day when I was having trouble reintegrating.


Is there any way to undo the local conflict mark?

I am inclined to just purgeml at this point.  Any reason not to?  It
doesn't sound like I have found a new bug.
Received on 2001-08-15 10:26:36