Coda File System

Re: Coda and GNU/Linux systems

From: Peter Braam <braam_at_carissimi.coda.cs.cmu.edu>
Date: Wed, 10 Dec 1997 18:10:24 -0500
The bug in question is a simple evolutionary design error in the RVM
transaction handling as applied to Coda. It is basically unrelated to
resolution and reintegration apart from the fact that the transactions
in these two cases are large and have a higher chance of failing than
on small FS operations.  The evolutionary aspect is that we don't need
the offending buffer cache at all, so I am ripping it out.  No big
deal.

The re-integration path will indeed become the canonical path.  Lily
Mummert and I designed a write back caching mechanism for Coda which
effectively "kicks" the client into "write disconnected" state.  It
uses logging to record the operations and they are re-integrated on
the servers.  Indeed, things like this should be part of the standard
path of the code and they will be. 

- Peter -

Brian Bartholomew writes:
 > > There are a number of bugs in Coda that can lead to corrupt data in
 > > Coda.  They are unlikely to crop up in strongly connected non
 > > replicated server setups, but when using re-integration, resolution
 > > and repair there is a bug in the transaction handling that can write
 > > stale buffer cache data to the disk.
 > 
 > I've never much trusted fault-tolerant implementations designed as a
 > "everything working" code path and a "during failure" code path,
 > because the "during failure" code doesn't get tested enough.  The
 > canonical example is a fault-tolerant NFS server implemented as two
 > workstations connected by a private network and some specialized
 > identity-takeover code.  If the vendor can't keep a workstation up
 > long enough to be a reliable NFS server, why do I trust them to get
 > the failover code right?  Failover is a harder problem.
 > 
 > Is there some way that the re-integration code in Coda could be
 > exercised in normal use?  My favorite example of this is the reliable
 > multicast protocol isis, where the complex parts of the failover
 > checks are exercised for normal reception.  When something really does
 > fail, not much changes in the execution profile.
 > 
 > Also, could a 'crashme' suite be written for Coda?  This could
 > generate ill-formed Coda traffic, and insert randomly changing delays
 > into the code to exercise timing relationships and race conditions.
 > 
 > 
 > Another member of the League for Programming Freedom (LPF) www.lpf.org
 > -------------------------------------------------------------------------------
 > Brian Bartholomew - bb_at_wv.com - www.wv.com - Working Version, Cambridge, MA
Received on 1997-12-10 18:27:03