Coda File System

From: <shivers_at_cc.gatech.edu> Date: Sun, 25 Jul 2004 14:14:54 -0400

   From: Ivan Popov <pin_at_medic.chalmers.se>
   nice to see a detailed and structured report.

Nice to see a detailed and structured reply! Thanks for your post, Ivan.

   (yet I do not seem to find how big your client caches were)

Pretty big, varying from 100Mb to 10Gb.

   >     4b0fb02ca5944804cc403b6ff1f3797a  ./affection/audio_01.inf
   >     md5sum: ./affection/track05.cdda.flac: Connection timed out
   >     find: ./affection/audio_08.inf: Connection timed out

   I think it has been discussed some time.
   There are situations when it takes a long time for the server to answer.
   Combined with some packet loss it may lead the client to the conclusion
   that the server is unreachable.

   It is a fundamental functionality in Coda - the client decides itself

"Functionality" is not the word I would choose. I wouldn't even use the 
word, "feature."

   when it goes disconnected. We do not want the client stall too much.

Why not? We're talking about *reads* here. Were it a write, then, yes, we
could cache it and do it later. But it's a read. The choice is either
stall or abort. Coda aborts.

   May be it is possible to improve the protocol and the algorithm, but
   it is not at all "that bad", and is hard to make changes to.

   There is room for improvement and Jan is well aware of the problem,
   but I think it is rather low on the priority list.

I would assert that it *is* that bad. It's broken. Let me summarise the
failure. 
  - Both the client and the server have a high-quality net connection. 
    No phone lines. No cable modems. Real honest-to-goodness Internet. 
  - In this mode, coda will just arbitrarily blow away file reads. 
  - So you have no assurance your ops will win. 

Note that I was not "pushing" the filesys hard. I was just using it. My
client computer was accessing no more than one file from the coda fs at
a time. Granted, these were 3-10Mb files -- so what? I didn't have 1,000
processes, each of which had many open files in, with concurrent operations
hitting the same files, which were also being accessed on other servers.
*That* would be pushing the filesystem.

The picture that I am developing here of coda is that if you tune it just
right, and your network connection has certain good properties, and your
patterns of access stay within some (unspecified) envelope, you have good odds
of winning most of the time. But you better be prepared to deal with failures
whenever you operate on the filesys; they do happen.

   In http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2004/6115.html
   Jan wrote

   | I've said this many times before, there is no such thing as guaranteed
   | connected operation in Coda. If anything goes wrong during a write/store
   | operation the client will silently switch to write-disconnected
   | operation (logging state). If the server is slow to respond we switch to
   | a logging state. And reversely, when the client can't be reached by the
   | server, the server triggers the disconnect were are likely to switch to
   | a logging state.
   |
   | The only thing that cfs strong does is prevent the client from listening
   | to the often incorrect 'bandwidth estimates' from the RPC2 communication
   | layer, so that transitions only happen in error cases and not based on
   | incorrect estimates. In fact, if you were already write-disconnected
   | before calling cfs strong, the client will never discover that the
   | network actually has good bandwidth and will never transition to the
   | connected state.

   > So the real-world operation of coda here is that if you start writing a
   > lot of data, you disconnect, and then your writes just fail. So you can't
   > ever count on some operation actually working; it could very easily fail
   > mid-stream.

   It depends on the operation and the circumstances.
   If you start the operation during good connectivity and then your
   mobile phone connection goes down, then both reading (obviously) and writing
   (say when you do not have enough space in the client cache or in the cml)
   can fail.

   Of course we do not want the connection to be treated as unavailable
   while the net and the server are still there. It will become better as time
   goes but for the moment you have to make precautions for bulk copies.

I note that it has been > 10 years. And it appears to be, in some sense,
a deep part of coda's design philosophy.

   > that access my coda files sometimes win and sometimes seem to drive the
   > system into disconnected state, and then I must go through a
   >     cfs wr
   >     cfs cs
   >     cfs lv .
   > dance to reconnect. This happens when I am on a client with a completely
   > stable connection to the ethernet. We are not talking phone lines here.
   > This essentially renders coda unusable.

   I am familiar with the problem, still I find Coda usable.
   One workaround I had to use when my servers or network were slow,
   was to run a loop of "cfs cs" which helps against disconnections.

That kind of voodoo is a symptom that something is really wrong. Let me
restate it: if I can reliably hang a network filesystem on a *connected
client* simply by doing
    find . -file f -exec md5sum {} \;
then the filesystem is broken. It's not my fault. The filesystem is broken.

   > 2. Do other people lose in this way? / Are other people winning?

   It is a known problem ("unnecessary disconnections" while a retry or extra wait
   would help). A lot of complaints may raise the priority to fix...
   There is probably a certain way to get the fixes done, just fund the work :)
   I'd rather accept these inconveniences for more important fixes and
   improvements.

   > 3. Is coda not ready for really big repositories (800Gb filesys, 1Gb rvm
   >    metadata)?

   I am running with 768Mb rvm but as my files are small - "typical Unix" :)
   it maps to just about max 30G data.
   It should not be any problem to fill more space with bigger files.

   > 4. Any advice at all?

   Coda offers unique possibilities - for some price. The usage pattern
   has to be "Coda friendly" - and probably will have to, even after ultimate
   fixes and improvements.

I am -- very regretfully -- concluding that coda is not something I can use
& am abandoning my attempts to use it. Bummer.

I've left out some other bad behaviours I've encountered. I managed
this week to generate this df output on a coda client:

    [root_at_northpoint coda]$ df
    Filesystem           1K-blocks      Used Available Use% Mounted on
    /dev/hda2             10321168   3625088   6171796  38% /
    /dev/hda1                99043     38128     55801  41% /boot
    /dev/hda5             15481496   3280176  11414908  23% /var
    /dev/hda6             10321136   2553020   7243832  27% /home
    none                     30916         0     30916   0% /dev/shm
    coda                    100000 -18446744069414692531 4294952835 101% /coda
    coda                    100000 -18446744069414692531 4294952835 101% /coda

I've also observed "cfs lv ." hang until I killed a venus on a *different*
client, which hinted to me that there was some kind of locking going on
that I didn't understand, but, in any event, just increased my feelings of
unease.
    -Olin

Coda File System

Coda lossage