Coda File System

RE: ok lets see now...

From: Shafeeq Sinnamohideen <shafeeq_at_cs.cmu.edu>
Date: Wed, 16 May 2001 10:53:31 -0400 (EDT)
On Wed, 16 May 2001, Steve Wray wrote:

> > It doesn't matter what kind of FS is used on the server. Only the client
> > is picky because we need to be able to access the container files from
> > within the kernel to avoid bouncing all read and write calls up to
> > userspace.
> 
> I only have 2 linux boxes, both are running RH7.1 with XFS
> on LVM. So, one is a client and it has XFS.
> 
> I'm not sure how to interpret your comment about the client...?

The client venus cache partition must be on an ext2, reiser, or ramfs
partition for it to work. This is because when the Coda kernel module gets
a request, it must be able, in the kernel, to forward it to the file
system that contains the container file so it can do the operation.

> > > Also, I'm noticing that when I try to populate the
> > > /coda filesystem it seems really slow; even on the
> > > machine thats actually hosting that volume.
> [snip]
> > > Its like network filesystem performance, only on
> > > the server.
> > > 
> > > What might be wrong?
> > 
> > RVM is probably most of the cost. Adding and removing directory
> > entries (i.e. creating and deleting files) involve a lot of RVM
> > operations. RVM is dealt with syncronously, i.e. all modifications are
> > explicitly flushed and committed to disk before we return from an
> > operation. Also, all RVM transactions are serialized, killing any form
> > of gain that might come from having multiple concurrent threads.
> 
> So this is an unavoidable performance problem with Coda in general?

Yes, server replication depends on having strong guarantees about the
state of a server when it is restarted after a crash. This requires
file metadata operations to be transactional, which requires the
synchronous write to a write-ahead log. 

The overall design of Coda assumes that writes are much less frequent than
reads, which is the experience from AFS. Thus Coda is less suited for
workloads that write heavily. 

Write-disconnected and write-back modes exist for the situations where a
single client does a lot of writes before another client reads. In these
cases, the writing client will buffer writes locally and ship only the
final result to the server. The danger is looser consistency in
write-disconnected mode and the longer window before the data gets to the
server.

Of course, the server doesn't do anything special for the client running
on the same machine, only the bulk data transfers go faster across the
"network".

> Does this mean that Linux is particularly bad for Coda?
> Is this fixable with any tweaking? Different filesystems?
> 
On the BSDs, one can place the RVM log file on a raw disk partiton, so
accesses will not go through a file system. 

Generally, placing the log file an a separate physical disk will help,
since only the log needs to be appended to synchronously, while the data
file and /vicepa can be written lazily.

Shafeeq
Received on 2001-05-16 10:53:50