Coda File System

Re: venus-kernel interface

From: Jan Harkes <>
Date: Thu, 30 Sep 2004 11:18:29 -0400
On Thu, Sep 30, 2004 at 10:56:28AM +0200, Ivan Popov wrote:
> If Venus would use _only_ totally standard and traditional syscalls
> like open() read() write(), then we certainly could run the cache manager
> and the tools via any abi.
> Just speculating, if it would make the things hard to implement,
> if we use a smaller subset of syscalls?
> I can think that instead of ioctl(fd, OP, inoutdata)
> we could do write(fd,"OPoutdata") followed by read(fd,"indata")
> Well, twice as many context switches... but arbitrary lenght data...

Interesting thought, we're already using the special file
'/coda/.CONTROL' to perform the ioctls on because we can't use regular
ioctls on device nodes, symlinks or directories. Our pioctl code is
mainly a wrapper that calls ioctl('/coda/.CONTROL', ...).

I have never tried, but since Linux passes down an open fd, it doesn't
have to be a container file and could be an open socket or pipe. So the
pioctl wrapper could work as follows.

    pioctl opens a magic file /coda/.CONTROLPIPE
    venus returns an open socket or pipe.
    pioctl writes request
    pioctl reads reply
    pioctl closes magic file

This requires no kernel changes on Linux. If we want the pioctl using
application to use select instead of a blocking read, some additional
changes will probably be needed.

One interesting thing is that with the existing implementation this even
works when several pioctl using clients are active. They will not share
the same endpoint because we keep the 'container' file handle associated
with the Coda file handle and redirect the read and write calls at the
file level instead of the inode level. Only mmap will try to work on the
inode level, we never allow mmap to succeed if another container file
handle is already mapped to the inode, since sockets can't be mmapped
this shouldn't even be a problem.

Ofcourse handling this in venus will require some work, it has to
recognize the open for the magic file (the .CONTROL file is dealt with
in the kernel itself), but as the top-level volume is already 90% magic
it shouldn't really need too many changed. Then we need a listener
thread for the socket endpoint probably quite similar to the existing
MarinerPort stuff, and we need a protocol, maybe just write the existing
binary viceioctl data.

Finally, all this is of course of no use if we can't implement the same
thing on *BSD, I don't know enough of the kernels to tell whether their
VFS has enough of a similar structure to allow for this abuse.

Alternatively we could try to use container files, have pioctl open
/coda/.CONTROLFILE-XXX, but then how can we tell when the reply is
ready. This seems to be a non-workable solution.

A final solution I can think of would be to allow (non-venus) processes
to open /dev/cfs0 and forward every write to venus as a CODA_PIOCTL
upcall and return the reply on the next read. This ofcourse would need
some kernel hacking and some way to distinguish venus from normal

Received on 2004-09-30 11:19:35