Coda File System

Coda maturity report, kind of

From: Ivan Popov <pin_at_math.chalmers.se>
Date: Sun, 27 Apr 2003 11:32:39 +0200 (MET DST)
Hello,

I think it can be useful to report usage stories to get a better picture
of what Coda can cope with.

I've reached some milestone rearranging my Coda setup and here it is:
(a similar setup has been in use for a couple of years)

 - Linux 2.4.20 kernel, cvs coda patch applied,
   additionally patched to reverse some NFS-related kernel stuff that
   otherwise causes erroneous "stale NFS handle" on Coda, under heavy load
   (not Coda's mistake)
 - recent Coda and libraries, from cvs, cell-realm aware
 - kerberos authentication only
 - volume names consistent with mountpoints, like volume "/dirs/dir"
   to be mounted at /coda/<my-cell>/dirs/dir
 - two servers, all volumes double-replicated
   768 Mb RVM data, near the max value for Linux,
   18 Gb for files (used about 50%)

 - user home directories on Coda
 - all software except Coda binaries and config is located on Coda,
   accessible only for authenticated users

 - pam_krb5 + pam_kcoda on clients
 - clients (i.e. their accounts "root" and "daemon")
   have kerberos- and coda-
   identities like root/<hostname>, daemon/<hostname>
   (and corresponding keytabs locally)
   so that they can run software from Coda,
   consequently the clients have basically only startup scripts on the
   local disk, the rest (say daemons like ntp, xdm) is pulled from Coda

 - half a dozen users, able to use any of a half a dozen clients

 - users have got volumes for
   - traditional homedir     (~/ == ~/../priv/) private area
   - projects to share with several other users (~/../proj/)
   - area to publish data    (~/../pub/) default acls: readable for anyone
   - area for temporary data (~/../tmp/)
   all of the above except tmp is complemented by corresponding .backup
   volumes (and mountpoints), re-synched each night

 - no disconnected operation is used (there is no need yet)

 - hoarding not used, for simplicity

The net result:

 - it works
 - servers can die sometime, seldom enough to not irritate, nobody notices
   anyway :)
 - clients crash "reliably" soon after venus cache fills up the first
   time (an annoying  bug, probably having been hidden by other recently
   fixed ones, use a huge cache to work around)
 - in case venus reinit is necessary, even in the worst case any lost data
   is usually not worth more than several minutes work, i.e. users stay
   happy

Performance,
on a 10Mbit Ethernet and with average processor speed 400 Mhz for both
clients and servers:

 - [open-for-]read big files - about 1 Mbyte/sec, limited by the net
 - [close-after-]write big files - about 1 Mbyte/sec, limited by the net
   (0.5 Mbyte/sec in strongly connected mode as the data is copied to two
   servers - multicast is not used nowadays)
 - create lots of small files, delete lots of files - a few per second,
   in strongly connected mode. Not measured in write-disconnected mode.
 - read/write cached files - very fast, up to 20Mbyte/sec on my best
   disk/IDE controller :-)
 - lookup (open) cached files - not measured

Scalability:

 - the net is the bottleneck, noticeable when a client fills its cache for
   the first time, otherwise there is no remarkable file traffic and
   servers are totally idle most of the time.

   With other words, the total cache update badwidth available to the
   clients is that of the servers' ethernet connection, not of the
   software or server hardware.

   1 Mb/sec is "very slow" when you start mozilla for the first time (many
   megabytes to load into the cache), but no delays and no traffic later.
   You notice a delay the next time only when mozilla is upgraded and the
   new binary/libraries have to be cached.

It would be exciting to hear your stories, too.

Best regards,
--
Ivan
Received on 2003-04-27 05:35:55