Coda File System

Re: Clients never see updates over slow connection

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 13 Aug 1999 17:36:58 -0400
On Fri, Aug 13, 1999 at 04:01:52PM -0500, Bill Gribble wrote:
> > On Fri, Aug 13, 1999 at 02:37:16PM -0500, Bill Gribble wrote:
> > > There's a process on the slow client creating log files (in a Coda
> > > volume served by a third machine on the fast network) that are to be
> > > analyzed by a process on the fast client.
> > > 
> > > Everybody's clocks are synced by xntp. 
> > > 
> > > I ran my codes and found that the log files were never appearing on
> > > the fast client (or, more precisely, appeared but were empty), even
> > > though the slow client issued a 'cfs strong'. 
> >
> > Hi Bill,
> >
> > Yup, that is a sometimes unexpected `feature' of the way Coda guarantees
> > consistency of the filedata. As long as the (syslog?) daemon keeps the
> > files open for writing, Coda assumes that the file is still in a state
> > of inconsistency (being updated). The file will be sent as soon as the
> > last writer closes it,. except if you are weakly connected, then it will
> > take about 5 minutes longer. You can rotate the logs every once in a
> > while to `flush' them to the servers.
> 
> Hmm... I understand how that might happen, but why is it that even now
> (after I have read your reply, almost 2 hours after the last writer
> process quit and "fuser" shows nobody having the file open for read or
> write) the same thing is showing?  i.e. full files on one client,
> empty files on others, and specific 'cfs strong' directives to all
> clients so everybody is strongly connected.
> 
> > Also you will not be able to easily `outsmart' the logic in venus by
> > closing and opening the file between each write call. That stops working
> > as soon as venus starts logging, then it uses the open-for-writing state
> > of the file as an indication to optimize the pending stores out of the
> > log.
> 
> I don't understand.  If the file has been closed and that's not enough
> to get it to sync, what other operation IS enough?

Consider _that_ a bug. If a `dirty' file is not held open by any
user-space process, then it either should be sent to the server
immediately (connected mode), or be trickle-reintegrated within some
bounded time.

Except if there is a conflict, or the reintegration is waiting for the
user to authenticate. 'cfs wr' should force the volume back in connected
mode, and trigger a full reintegration. If that doesn't work,
re-authenticate with clog, this disconnects/reconnects the client. Or do
a couple of `cfs cs' calls which cause venus to probe the network and
adjust its bandwidth estimate.

> In any case, no one has opened these files for writing in hours and
> they are inconsistent across clients.  That seems broken to
> me. Somehow venus has forgotten about these particular files.  What
> can I look for in the venus logs to show me what's going on?

/usr/coda/etc/console should show messages if there are inconsistent
objects, or the reintegration has failed. codacon should once in a while
print bandwidth estimates. There may be a problem when they are very
low, I'm suspecting an overflow when the bandwidth estimate falls under
100 B/s, which causes the estimation to never recover without restarting
venus. /usr/coda/venus.cache/venus.log should have lots of information
about `GetReintegratable' the function that evaluates whether
reintegration log entries are ready for reintegration.

> If it matters, the current sequence of events with approximate timings is:
> 
> Time/range       Client	   Action
> 0s		 slow	   fopen("filename.log")
> 0+s-20s		 slow	   fprintf data
> 20+s		 slow	   fflush, fclose
> 20s-30s		 fast	   sleep, hoping for Coda to catch up
> 30+s-35s	 fast	   analyze data 
> 35+s		 slow	   rename("filename.log", "filename.log.{n}")
> 
> (loop through this cycle indefinitely, incrementing {n}.)
> 
> Currently, on the first iteration of this loop, the data is available
> for analysis on the fast client.  On second and subsequent iterations,
> (i.e. after a rename() call) new data never shows up on the fast
> client.
> 
> Could it be the rename() call that's causing problems?  

Possibly, I know I have fixed some problem where cross-directory rename
operations weren't resolved, maybe there are more problems in that area.
Also, when weakly connected, all reintegrations are sent to only one
server, and then resolution is used to update the others. If you are
writing to a replicated volume, starting venus with the -noRoundRobin
flag might help, this avoids switching primary-server when multiple
servers are available. Then the weak-reintegrations will not get a
conflict when the resolution has to be retried.

Jan
Received on 1999-08-13 17:50:47