Coda File System

From: Derek Simkowiak <dereks_at_itsite.com> Date: Tue, 24 Sep 2002 12:20:42 -0700 (PDT)

     Hello,
	I want to use CODA for a load-balanced production webserver.  I
plan on sharing my experience with the list once it's deployed.  But in
the meantime, I need some help.

	I want to use CODA on a Linux Virtual Server-based, load-balanced
cluster (using LVS-NAT).  This cluster will be using WebDAV to do
read/write filesharing to multiple (~20 or so) clients, and general HTTP
(read-only) website serving for a medium-sized website.

	I only need CODA for the purposes of replicating the hard drives
safely acrossed the nodes of the cluster -- that is, the only "users" of
CODA will be the nodes of the cluster itself.  My endusers will not have
any CODA client installed.  Endusers will go to WindowsXP Explorer
"Tools->Map Network Drive" and then type in the https:// URL for the
load-balanced cluster.  They will drag and drop files to this WebDAV
share, which will result in Apache/mod_dav writing a file to the the CODA
filesystem, which will result in that file appearing on all of the nodes
in the cluster, which will result in any future clients seeing that file
regardless of which load-balanced node they actually get served by.

	CODA was chosen over NFS because of its instant replication
service, and over Linux's "nbd.o" and "enbd.o" network block device
modules (with RAID) because replication is atomic to files, not blocks,
and because CODA can support more than one backup copy (i.e., several
nodes can be in sync at once).

	Reading over the docs, some questions came up.  The first one
is... what should the "big picture" of this setup look like?  Would each
node in the cluster need to run both the CODA server _and_ client
software?  I assume so, otherwise, how could the CODA server software
replicate file changes to the rest of the servers, if those changes are
not coming in via a CODA client?  But then... would each node be a client
to itself?  Or would it be a client to the master CODA server (i.e., just
one of the nodes chosen at random), requiring a config change/failover if
the CODA master server crashes?  I really need an overview of how an
experienced CODA admin would configure this.

	Next question: I want to serve about 80 Gigs of space via WebDAV
(on the CODA filesystem).  Has CODA ever been used to serve up more than 2
Gigs at a time?  The documentation says nothing about large volumes.

	Next, the HOWTO says to set aside 4% of your total volume space
for RVM metadata storage.  For my setup, that would be 3.2 Gigs of RVM
space for CODA.  But I want to use a file, not a raw partition (for
several reasons).  Is using a huge 3.2 gig file for RVM metadata a
plausable option?  Or should I just give up now and use the raw partition
from the start?  (The docs say performance suffers when using a file, but
does not give any indication of how much slower using a file is or if
there are any limitations on the file size.)

	Next is about the Virtual Memory.  My servers each have 1 gig of
RAM.  But according to the HOWTO, I would need 3.2 Gigs (4% of my 80 Gig
share) of virtual memory _on top of_ any other applications I would need.
I don't think Linux allows for a 3.2 Gig swap partition.  Will this be a
showstopper for me?  Does this mean I can't serve up 80 Gigs of space?

	The next question comes from the HOWTO.  The HOWTO says "Do not go
above 300 Meg on the [client-side] cache."  What is that 300 Meg limit?
Is that simply an old number written down years ago?  Has any testing on
the client-side cache size ever been done?  (Need more info, please.)


	Basically, I want to make sure CODA is at least theoretically able
to handle my needs, before I spend a bunch of time learning CODA, doing
stress testing, etc.  We're happy to share whatever we learn with the list
-- I'm sure the greater CODA community will benefit from the things we
will learn from this project -- but in the meantime, any help is greatly
appreciated.


Thank You,
Derek Simkowiak
dereks at itsite dot com

Coda File System

Some questions...