Coda File System

Re: filesystem sizes

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 2 May 2007 16:43:41 -0400
On Wed, May 02, 2007 at 01:01:55PM -0400, shivers_at_ccs.neu.edu wrote:
>     From: Jan Harkes <jaharkes_at_cs.cmu.edu> 
>     Clearly the average file size is considerably larger and we are far more
>     likely to see reasonable numbers for the number of cached files. If we
>     have a 1TB cache we may see something in the order of 200K digital
>     photos, 3000 whole-CD flacs, 1000 TV recordings, or a couple of hundred
>     VM images.
> 
> Actually, I *just now* checked my 20Gb homedir:
>     % du -sk . ; find . -type f -print | wc -l
>     20088072        .
>     379371
> 
> 20088072/379371 = 53
> 
> So my average file size is 53kb (a little less, actually, if du includes
> the blocks used by directories).

There is a small program (/usr/bin/rvmsizer) that is included in the
Coda server package, which is useful to estimate the amount of
recoverable memory a server needs to store a copy of a local tree. The
RVM numbers it gives do not really correspond to what is needed on the
client, but it does also report some known cases that a Coda client
cannot handle, such as too many files per directory.

    $ rvmsizer ~
    35875 directories, 603847 files, 48489 directory pages
    total file size        38499769202 bytes (36716.24MB)
    average file size      63757 bytes
    total directory size   163344384 bytes (155.78MB)
    average directory size 4553 bytes
    estimated RVM used by directory data, 99305472 bytes (94.71MB)
    estimated RVM usage based on object counts, 213477388 bytes (203.59MB)

So my average file size is also clearly not in the 'several MB range'.
But still, it is a factor 2-3 larger compared to the current value we
use in venus.

Initially BLOCKS_PER_FILE was 8KB, I guess that value was picked as an
appropriate average file size when development started, 1987-1988.

In 1998 we bumped BLOCKS_PER_FILE up to 24KB, I think this was after
checking the average filesize on various desktops, but we also looked at
the average size of files stored in /coda at the time. The measured
average may have been a little lower at the time.

Now it is almost 10 years later, and I don't find it surprising that the
average has gone up. Especially considering that disk is cheap and disk
space has been growing exponentially. I am surprised that the average
file size only seems to triple over the period of roughly 10 years, at
least as far as my personal files is concerned. On the other hand I
hardly ever throw things away, so the average for new files must be
higher.

> I had no idea it was so small, since I have down in that tree a couple of CD
> images and even a complete vmware virtual filesystem for a virtual WinXP image
> sitting in a pair of files, plus some music and probably a few video clips.

Right, I probably have digital photos, maybe some music, lots of
sources, tarballs, maybe a VM image or two. I it also includes things
like my web browser cache.

> That's for a *single* user. For a ten-thousand user "campus", add 4 zeroes.

Yeah, but a 10,000 user campus would hopefully use a more than just a
single server for all of it's users. I think the AFS2 goal was in the
order of a thousand clients per server, not sure how close Coda gets to
that goal.

Jan
Received on 2007-05-02 16:44:43