Coda File System

Re: My experiences with Coda and why I went back to NFS

From: Petr Tuma <petr.tuma_at_mff.cuni.cz>
Date: Sun, 28 Jan 2001 23:27:58 +0100
Hello,

seeing some points I'd like to comment on (my problem with upgrade from 
.11 to .12 miraculously went away after I rebooted all servers several 
times to get fresh logs for Jan, now it's just tons of repairs, thanx 
still :).

> There are several big hurdles we have taken and still have to take to
> get Coda nicely integrated on Windows platforms.

Maybe focusing on NTs/2Ks would make more sense ? I mean, I don't see 
much point in trying to bend over backwards to accomodate an outdated 
virtual machine concept ?

> The only effect that clock skew has is when applications that use RVM
> are restarted and the time has warped back. The client-server
> interaction has definitely no time dependent parts, we wouldn't even
> dare consider calling this a _Distributed_ File System if it did. 

I have different experience here. I had definite problems when one of 
the replicated servers had a date set one day back compared to all the 
others. Mostly it was loosing authentication tokens some time in the 
middle of working with the volume (empirically, it was when the one 
server with wrong time figured the token should expire), which caused 
"false" conflicts to appear, and made the system basically unusable. 
After syncing all with NTP the problems went away.

Also, seeing how Jan really reads this, I thought to add a few more 
observations:

I definitely agree the documentation should be updated. At least the 
"quick install" guide. What I personally missed in my hmm, month or so ? 
with Coda, was:

  - Outdated installation instructions. In the end, it turned out to be 
just "rpm -U coda*.rpm", but I kept worrying if it is really so easy 
when all the docs says things should be started manually (not through 
rc.d scripts) etc.

  - Also, I tended to get myself lost in the various IDs used. After 
several adds and removes of volumes, I kinda figured it out, but a few 
lines in the docs (e.g. that one is supposed to come up with IDs of 
groups for VSGDB) would help.

  - The documentation seems to give the system a more unstable feel than 
it probably deserves. One example I recall is the private mmap support, 
which I did not dare try due to warnings in the config file comments, 
but switched to after I saw a message here that suggested it should work 
OK, and it was a great improvement (one of my servers is low on memory 
and this helped a lot).

I also encountered the problem with local vs. fully qualified hostname. 
What seems to work is keep /vice/hostname short and names in server list 
and everywhere else long. This is something one has to do manually after 
install.

There are bugs in scripts. At my servers, purgevol_rep (or whatever is 
the script for deleting volumes called) fails, obviously looking for 
some list of volumes that does not exist in the current version.

This, however, was all things one could kinda sorta work around. What I 
have a biggest problem with is the repair tool. I run three servers in 
an environment where very often, one or two are inaccessible (but there 
are never concurrent updates from more clients). When conflicts occur, 
the repair tool often fails to fix them, and I usually end up with doing 
"beginrepair", copying all the copies somewhere else, doing "removeinc" 
and copying the data back (sometimes, "removeinc" fails on one server 
but not on another, etc.).

---

BUT ! It still seems to work and is usable ! (at least on my platform, I 
really didn't try Windows)

The questions I'm running into now are more of the sort, is Coda really 
the system I was looking for in my particular situation ? I have three 
machines I work on, geographically distant and not always connected very 
well. I used to have a script that used rsync to keep the home 
directories synchronized, but this was not really reliable (rsync has 
problems with links, changes that involve modifying directory structure, 
and bidirectional updates, among others).

I switched to Coda, and made all the three computers servers (in hope of 
having the updates always propagated to at least one server when the 
network is down). This seems to work nicely, except for the following:

  - I have to use write disconnected mode, otherwise the delays in 
propagating the updates to all the servers are too big. In that mode, 
however, conflicts appear much more often (and even in situations where 
I don't think they should).

  - Some of the files I have are logs, of several megs in size, that 
only get appended to. In the write disconnected mode, it seems that the 
updates are propagated in form of entire files, not just change logs, 
which means huge amounts of data get passed over the net for even small 
changes. (Just a hunch looking at the network traffic.)

  - I use up twice as much disk space, because I have both the client 
and the server running on a machine where I'd normally have just one 
copy of the data.

Now, did I pick Coda right, or am I trying to use it in a situation 
where something else (what ?) would be better ?

Petr Tuma
Received on 2001-01-28 17:28:33