Coda File System

Coda Nightmare "or" How coda doesn't like me...

From: redirecting decoy <>
Date: Mon, 1 Nov 2004 14:14:50 -0800 (PST)
Hello everyone,

So tell me, how is it that I can't make this nice
little filesystem function correctly.  I "guess", the
replication works. But now it seems that I am having
trouble with my clients.  First of all, whenever I
stop venus, in order to restart it I have to do "Venus
-init", otherwise it turns into a zombie. Once I do
that, then I can stop venus and restart it without the
"-init". What is causing this?  Second it seems that
my clients to NOT like to stay connected. I can manage
to get all of my client's activly connected for a few
minutes.  I can test this by modifying a file on the
server, then making sure the changes made it to every
other client. This works in the beginning. But after a
while (time seems random), the clients start
disconnecting.  Also, once any type of work is done
inside the filesystem, the client in which I am using
wants to disconnect.   In order for my changes to
propagate, I have to do a combination  of the
following commands, each of which has it's own

Commands I try that eventually gets me reconnected (in
no particular order):

*NOTE: m1.public and m2.public act as server's and as

1) cfs cs myrealm
   *Note: I try this command, and sometimes it works,
other times to doesn't. 
My realm file looks like this: "myrealm    m1.public
m2.public".  When I use the above command, it often
tells me that the server's are still disconnected,
which is wrong, because if I do "cfs cs m(1,2).public"
it works just fine.  Why would this happen? When the
above command DOES work, then my clients continue to
function for a little while longer.  

2) cfs reconnect
  *Note: I have never been able to get this command to

3) cfs fr /coda/myrealm/storage
  *Note: Sometimes I have to do this inorder for
changes to be made to the server.  Isn't there a
better, automatic way?

4) echo -n "pwd" | clog user
  *Note: I run this command on all my machines, and
sometimes it reconnects the volume, sometimes it
doesnt. Functionality is sporadic.

Trying the above commands in different orders
sometime's get's me reconnected. However once, I try
to continue moderate use of the Filesystem, everything
dies again.  I cannot manage to make it stable.

Perhaps if I explain one application that I am using,
everyone will get a better idea of what I am trying to
make coda do. Perhaps my work is incompatable with
coda. I am unsure. 

Basically, one of the applications I am attempting to
run is an mpi version of povray.  MpiPovray works best
with shared storage.  If I have 8 machines total, then
I am running povray on all of them, and each machine
must be able to write to a particular "working
directory" at the same time. I am attempting to use
coda as this "working directory".  At most the program
instance running on each nodes will write a log file
to the working dir. I am unsure if this particular
application send's data back to the master mode to
final output to file, or if each worker node write's
to the file from it's respective machine.   So, my
problem is that the filesystem either can't seem to
keep up with the operations of the program, which
seems wrong to me because it worked fine before I
tried replication.   I cannot seem to find any usefull
information in my log files. When coda works, it works
great, but then it just stops working. No obvious
error entries.  If anyone is interested I can clear my
log files, and repeat everything I do that gets me to
the dead filesystem point. I can make the log files
available to whoever would like them.

BTW: in my server.conf file on both servers I have
"mapprivate=1" enabled. and in venus.conf would
"dontuservm=1" make any difference to my situation. I
am unsure of it's proper use.

If anyone can offer any suggestions or help, it would
be greatly appreciated.

Thanks in advance, I'm going to go hide under a rock.


Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
Received on 2004-11-01 17:17:30