Coda File System

Re: design, go beyond AFS?

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 20 Aug 2002 00:17:45 -0400
This is very interesting and well thought out.

On Mon, Aug 19, 2002 at 03:11:44PM +0200, Ivan Popov wrote:
> I have not seen anything that would really need the separation between
> mount point name space and volume name space.
> 
> Rather opposite is true, the arbitrary mapping between volumes and mount
> points creates confusion and even semantical inconsistency (in manuals
> for afs, dfs, coda you read "do not create multiple mount points for the
> same volume", that means we have not fully implemented the semantics and
> it works essentially by chance - there is nothing that can prevent
> creation of multiple mount points, in the general case).

Multiple mount points for one volume are technically only problematic
because of 2 reasons.

 - Loops/infinite pathnames, /coda/a/b/c/a/b/c/a/...

   It should not be too hard to detect this by walking the path back up
   to the root whenever a volume is mounted. And when a loop is
   detected, substitute a symlink that points back up the tree. i.e.
   turn the previous situation into /coda/a/b/c/a->../..

 - The kernel is unable to represent the same directory in multiple
   places because '..' should be different. Linux-2.4.19 has a fix for
   this, we dynamically generate the '.' and '..' entries in readdir
   based on the information in the dentry cache.

So the big technical hurdles are pretty much 'solved'. I had to solve it
because even with the small cluster we have here, I sometimes forget
where a volume was mounted and when resolving data back to a newly
reinitialized server I can simply mount it's volumes anywhere and don't
worry about where to find them. Which is exactly why your ideas are so
interesting.

The only thing I don't see right now is how a client would obtain the
list of volume names. Currently we hit the 'mount link' in the tree and
mount the volume it refers to. In this case we get the volume location
from the place where the mount link was created. But with your scheme
these links wouldn't exist, so would we need to get the list of all
volume names when we connect to a server for the first time?

> No functionality would be lost and a lot of problems avoided if
> hypothetical volume names would have to coincide with the mount point
> names. I mean - always. (given of course that there is a "rename"
> operation on volumes).

There actually is a rename for volumes, but I wouldn't recommend using
it right now because it doesn't correctly deal with replicated volumes
(i.e. gives the underlying replicas the same name as the replicated
volume and confuses mounting).

> To conclude, I'd welcome volume names that allowed any characters
> including '/', with length up to MAXPATHLEN (I mean "big enough", like
> 4 Kbyte, that would not cost a lot of space, typical volumes are much
> larger than 4K :-) (The volume name would not have to include the
> standard "/coda/" prefix, of course)

I see a point for larger volume names, although that breaks the existing
client-server rpc's. I think the '/' character can be used already.

> Then mountpoints would not have to contain any information except "I'm a
> mountpoint".

Ahh, I think I see your point. I'm a mountpoint triggers the client to
try to mount a volume named `pwd`. Ingenious, so the root volume would
simply be called "". It would mix in well with the stuff I'm working on
right now, where the root-volume of a realm is accessed by mounting
"@realm", while regular volumes are "foo.bar_at_realm". I'm currently only
allowing cross-realm mounts from a special dynamic directory, which has
these kinds of links,
    /coda/testserver.coda.cs.cmu.edu -> @testserver.coda.cs.cmu.edu
    /coda/coda.cs.cmu.edu -> @coda.cs.cmu.edu

But I could even interpret the empty volume name as use the current path
name relative to the root of this realm, in which case we would get your
solution automatically.

I do have some questions. How do you propose to address backup volumes,
underlying volume replicas (i.e. during repair), and cloned volumes. I
think we still need to keep the old methods of mounting a volume around
for at least these.

Jan
Received on 2002-08-20 00:22:40