Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Wed, 1 Sep 2004 10:37:48 -0400

On Wed, Sep 01, 2004 at 06:45:30AM -0400, David Howells wrote:
> > > I am not sure about persistency across reboots. Also it assumes that the
> > > cache is completely managed by some an in-kernel filesystem. So we would
> > > need a lot of hooks and changes before venus can put anything in there.
> 
> Not necessarily. You should be able to do it relatively easily from within the
> kernel. It needs you to declare your indexes (fill_super/put_supe) and files
> (iget/clear_inode), and to make calls from readpage() and writepage() and
> releasepage(). Coda would then own its own pages, which would be backed by
> CacheFS; CacheFS reads/writes directly from/to the netfs's pages.

I don't really understand what you are trying to say here, I really
should read the cachefs patch/documentation before trying to discuss it.

> I presume Coda loads a whole file at a time into its cache, messes
> around with it and writes the whole thing back? I could support that;
> and, in fact, I probably need to for my AFS client.

Technically we have a very simple glue layer in the kernel. As far as
any file operations are concerned we only see 2 events which are bounced
up to a userspace cache manager (venus).

file open
    here our cache manager checks if the file is locally cached, and if
    not fetches the complete file from the servers. Once we have a
    complete copy we return an open file handle back to the kernel. From
    this point on all read/write/mmap operations are nothing more than
    trivial wrappers whose main function is to forward the operation to
    the underlying file object.

file release
    this tells the userspace manager that a file reference was released,
    if the file was opened O_WRONLY/O_RDWR and there are no more writers
    left we mark the object as dirty and write it back to the server at
    the next opportunity.

I still want to add a file sync operation, so that we can write back a
snapshot (copy) of the file whenever an application calls fsync, it is
also important to catch the moment when a file is closed, as opposed to
the last release, we cannot return errors on close by the time we are
notified when a file is released.

But as far as the Coda kernel module is concerned, it doesn't deal with
readpage/writepage/etc. all that is left up to the filesystem that
stores the cached files, and this filesystem could possibly be cachefs.
However... when a file is fetched from the servers, the data in the
container file is written by a userspace process, so the (persistent)
tmpfs variant might work here. Ofcourse our code doesn't expect a cache
file (or parts of that file) to disappear when we don't explictily pin
it all the time, but pinning it would defeat the usefulness of cachefs
in the first place.

I guess if cachefs looks like a 'lossy' mountable filesystem, it might
even work without too many changes. We could just mount it in the place
of the venus cache directory and only need to use an ioctl to pin any
files that are opened for writing until we're sure that the changes are
fed back to the servers. There would be no communication between cachefs
and the Coda kernel module, everything still goes through venus.

 The Coda kernel module detects an open, sends the request to venus,
 venus opens (and optionally pins) a file in cachefs and fills it with
 the data. Then it passes the still open filehandle back to the Coda
 kernel module which keeps it around until the last reference
 disappears. Then the Coda kernel module sends the file release upcall
 and if it was opened for writing, venus reads directly from cachefs and
 writes the modified data back to the servers after which it unpins the
 file.

The only thing then is to replace the flag we use to indicates whether
we need to fetch the data or not with a stat(2) or access(2) test on the
container in cachefs.

> Tell me what you'd like to be able to store in CacheFS, and I'll see what I
> can do to accommodate you.

Ideally it would be a mountable file system that is persistent across
reboots. It doesn't necessarily need to support a directory tree
structure, just a single large top-level directory would do fine. It
should have the option to pin files even when they are not actively
referenced, and it should not delete little bits from the middle of a
file. I guess it would also need a way to query the objects to see if it
is pinned or not, that way we can check our own metadata against the
cache.

As you can see, except for the reclaimation, normal filesystems do just
fine. Because we always fetch a whole file we already know how much to
throw out before we even start to fetch a new file. Our weakness is that
we can't do the same when a file is opened for writing, there is no way
to tell how large a file will be when it is opened, and we won't see the
final size until it is closed.

By having the kernel/cachefs do the reclaimation, files would get
discarded whenever space is needed instead of after the fact. Also the
same cache space could be shared among different filesystems. So I can
definitely see some advantages.

Jan

Coda File System

Re: [dhowells@redhat.com: [PATCH] CacheFS - general filesystem cache]