Coda File System

Re: Coda vs. Google ...

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 30 Aug 2005 13:14:19 -0400
On Mon, Aug 15, 2005 at 01:31:43PM -0400, Kris Maglione wrote:
> There's nothing stopping Coda (in theory. I haven't seen the code
> relating to this) from implementing both partial and full file caching.
> Whether it be a knob between two modes of caching, a switch to require
> the fetching of all blocks (with unneeded ones at a lower priority, put
> off until essential data is retrieved), or just a program like hoard
> deciding what files need to be cached fully, and doing so. I'm not
> saying that this should or will be implemented, but it is possible, in
> theory. For Coda and AFS.

Actually there are many reasons to not have block level caching in Coda.

- VM deadlocks
    Because we have a userspace cache manager we could get into the
    situation where we are told to write out dirty data, but this causes
    us to request one or more memory pages from the kernel, either
    because we allocate memory, or are simply paging in some of the
    application/library code.  The kernel might then decide to give us
    pages that would require write-back of more dirty state to the
    userspace daemon. We would have to push venus into the kernel, which
    is what AFS did, but they aren't dealing with a lot of the same
    complexities like replication and reintegration.

- Code complexity
    It is already a hard enough problem to do optimistic replication and
    reintegration with whole files. The last thing I need right now is
    to add additional complexity so we suddenly have to reason about
    situations where we only happen to have parts of a locally modified
    file, which might already have been partially reintegration, but
    then overwritten on the server by another client and how to commit,
    revert or merge these local changes in the global replica(s). As
    well as effectivly maintaining the required data structures. The
    current RVM limitations are on number of file objects and not
    dependent on file size. You can cache 100 zero length files with the
    same overhead as far as the client in concerned as 100 files that
    are 1GB in size.

- Network performance
    It is more efficient to fetch a large file at once compared to
    requesting individual blocks. Available network bandwidth keeps
    increasing, but latency is bounded by the laws of physics. So the
    60ms roundtrip from coast-to-coast will remain. So requesting 1000
    individual 4KB blocks will always cost at least 60 seconds, while
    fetching a the same 4MB as a single file will become cheaper over
    time.

- Local performance
    Handling upcalls is quite expensive, there are at least 2 context
    switches and possibly some swapping/paging involved to get the
    request up to the cache manager and the response back to the
    application. Doing this on individual read and write operations
    would make the system a lot less responsive.

- Consistency model
    It is really easy to explain Coda's consistency model wrt other
    clients. You fetch a copy of the file when it is opened, and it is
    written back to the servers when it is closed (and it was modified).
    Now try to do the same if the client uses block-level caching. The
    picture quickly becomes very blurry, and Transarc AFS actually had
    (has?) a serious bug that leads to unexpected data loss in this area
    if people were assuming that it actually still provides AFS semantics.

    Also once a system provides block-level access, people start to
    expect the file system provides something close to UNIX semantics,
    which is really not a very usable model for any distributed
    filesystem.

Jan
Received on 2005-08-30 13:16:13