Coda File System

Re: partitions and partition sizes

From: Jan Harkes <>
Date: Mon, 4 Apr 2005 17:29:41 -0400
On Mon, Apr 04, 2005 at 09:55:14AM -0600, Patrick Walsh wrote:
> > > 	Our total storage need in coda will be around 40gb.
> > 
> > Then you want to run rvmsizer to check - probably you will be fine with
> > one server process, then use the maximal rvm size available, 1G.
> 	rvmsizer suggests a rvm size of 70mb (and that's with a little cushion
> added by me).  Would you recommend bumping it up to 500mb or 1G anyway?
> Note that these machines have 1GB of RAM and if the RVM must reside in
> memory, then it seems that it ought to be smaller than 1GB (provided the
> number of files and directories is sufficiently small).  Is that the
> correct thinking?  Or is the RVM metadata information no longer
> completely mapped into memory?

It is still completely mapped in memory, but it is typically possible to
fit at least 1GB into the available 4GB space. Larger than that becomes
more difficult and requires tricks like statically linking binaries so
that shared libraries aren't loaded in the places where we want RVM and
tweaking the base address where RVM is places to avoid it from bumping
into the stack, etc.

btw. my servers.. One group holds about 42GB of file data and uses 226MB
of RVM, the other holds 36GB of data and uses 158MB of RVM.

The volumes on these servers are a mix, and contain pretty much
everything from the Coda webpages/ftp and public CVS to user home

> > > * Second: I think I remember reading something about avoiding ext3.  Is
> > > that for the actual files?  Or just for rvm metadata and logs?  
> > 
> > It is no problem nowadays.
> 	Since I'm creating a partition just for file data for coda, is there a
> best-performing fs type to use?  ext2 or ext3?  Or does it make
> absolutely no difference?

For servers it shouldn't matter all that much, but the ext3 journalling
might make recovery a bit more reliable. Basically during startup the
server checks for every file in RVM whether the corresponding file
exists in /vicepa and typically triggers an assertion if this isn't the
case, mostly to prevent one corrupt server in a replicated group from
spreading the corruption to the other servers. If the data is still
correct on another server it is often safer to just destroy the corrupt
volume on a crashed server, then recreate the underlying replica and
resolve it's contents back from the other server(s).

> 	Thanks for the help.  I'm on my third test rollout of coda and I'm
> getting a better handle on what I'm doing.  At some point I'll be back
> here with questions about backing up.  I'm a bit unclear as to why
> standard dump type utilities can't be run on a /coda filesystem.  Also,

Our file identifiers are 128-bit, but the Linux VFS (and probably other
Unixes) only exports 32-bit inode numbers, so we hash the 128-bit
identifiers to mostly unique 32-bit ones. Userspace tools like dump and
tar use the inode number to identify hard-linked files, and as such any
collisions will be interpreted as a hard link and any files that happen
to have the same 32-bit inode number as a previously backed up file will
end up getting skipped. One solution would be to modify the backup tools
to check if 'nlink == 1' in which case the second file clearly cannot be
a hard linked copy of the first one.

> I thought that having multiple replicating servers provided automatic
> backup.  But I'm not ready to tackle these questions fully just yet...

But updates might have only reached one of the replicas, we only detect
version skew between replicas when a client checks the file or directory
attributes which contain the version vector. If a client only committed
updates on a single server, because of network problems or the other
servers were down, then those differences will only be resolved when at
some later point in time a client that is connected to all replicas
happens to look at the divergent objects.

Received on 2005-04-04 17:30:28