Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Sat, 26 Aug 2006 01:06:50 -0400

On Fri, Aug 25, 2006 at 09:18:12PM +0200, Michel Brabants wrote:
> I believe that this question has been asked in the past, but I would like to 
> ask it again :). I hope you don't mind it too much. Is coda ready to use it 
> in a production-environment?

I guess it depends a lot on the environment. If Coda fails, I lose a lot
of work (email, most of the webpages on www.coda, etc). However...

The way I use Coda avoids a lot of problems. Sometimes I don't think
Coda got that much better, but my use of it adapted well enough to avoid
common problems. Some examples,

My email is filed in maildir format mailboxes with no index files.
Maildir was designed to enable lockless, reliable delivery on NFS. It is
quite hard to get conflict, but even so, I've modified my mailchecker to
read the tmp directory before looking at new. This way it avoids
unnecessary cross-directory rename conflicts. I also archive mail in
time to make sure the mailbox never exceeds the 256K directory size
limit. And several email clients create index files which do not have
the nice lockless update properties of the maildir directories, running
such a client on more than one machine at the same time will give
conflicts.

For version control, I use git. Because they use compressed 'blobs'
named based to the sha1 of their content everything ends up with a
unique name, is write-once, and there is a very low chance of conflicts
(and those will be trivial to resolve as 2 'different' objects with the
same name have a very high probability of having the same content).
Because of these properties I can make commits from a disconnected
client without having to worry much about conflicts with commits from
other developers. A repository like subversion or cvs would be
unreliable when faced with disconnected updates, I'm not even sure how
one would deal with update conflicts in such a case.

For our webpages, I had quite a few conflicts that would happen when
hypermail was re-indexing the mailinglist archives. This was aggrevated
by the fact that it wouldn't thread right when incrementally updating
and so it rewrites everything about once every 15 minutes. So hypermail
actually builds the archives in a directory on the local disk, and I use
rsync to copy only the changed files to the right place in /coda.

> In our case this means that the product should work reliable. So, there 
> shouldn't be any data-loss because of product-mistakes. I would like to ask 
> you what your thought about this is? How reliable is coda? When not? Is there 
> a timeframe in which you think that coda will be reliable?

I would say from my experience, there has not been a lot of data
loss. Sometimes when there are reintegration conflicts it is 'easier' to
simply give up trying to fix the problems and simply purge all local
changes that haven't yet been reintegrated. This probably is the most
frequent form of data loss. My machines tend to run 24/7, I suspend my
laptop instead of rebooting it, there is probably some potential of
unintended data loss when you shutdown or reboot your system while
especially when there was a reintegration conflict. Typically when the
client fails to restart due to a bug, people can recover their local
changes from a checkpoint tarball in /usr/coda/spool/<uid>/.

There also is a case where some counter between the client and server
gets out of sync, and the server claims it already saw whatever
operations the client is trying to reintegrate. I've observed this on
different clients about 2-3 times over the past 5 years. Not sure what
triggers the problem, but it is pretty bad. The server simply doesn't
apply the operations, and the client believes they were successfully
applied and drops them. the only solution is to either reintialize the
client (losing the rest of the pending updates), or to restart the
servers, which clears the counters on the server end.

> Ofcourse, it should also be performant. Do you think that coda is quite 
> performant? Can it transfer at high speads? If there are any, could you give 
> some examples?

Coda is slower on directory operations and open and close calls because
these all have to be bounced up to the userspace cache manager. But it
is as fast as the underlying local disk filesystem for read and write
operations. We also take a hit on the network transfers, last time I
measured I saw writes in the order of 3MB/s to a single replica, 6MB/s
to a doubly replicated volume, and 7.5 or so to a triply replicated
volume. Not 100% sure anymore and this was a couple of years ago on a
100Base-T network before we added things like encryption.

Jan

Coda File System

Re: reliability,performance ?