Coda File System

Re: venus choking on large directories?

From: Jan Harkes <>
Date: Thu, 17 May 2001 10:09:58 -0400
On Wed, May 16, 2001 at 07:46:55PM -0400, Greg Troxel wrote:
> I had problems before, but I now think it was the same one.

No, I'm very sure it was a real off-by-one bug before, and you send in
the correct patch which did get applied. Otherwise your client would
have crashed in dirbody.c.

>   It would be nice if we packed the directories more efficiently, and
>   allowed them to grow beyond 256KB. But a lot of the cruft in this area
>   is legacy to avoid breaking backwards compatibility with the way servers
>   have been storing their directory information in RVM.
> Well, that's unfortunate.  I wonder how hard it would be to make this
> fail more gracefully, e.g. with ENOSPC instead of crashing.  It might
> be worth having a server format change to remove the arbitrary limit
> [ducking!]  (even though total RVM etc. is bounded, 4000 files in a
> directory doesn't seem all that wierd...).

I'm trying to cut a 'fail graceful' path through the spaghetti, But I
don't know whether I can make it nicely all the way. Perhaps we need a
function 'DIR_SpaceForName()' to check and perhaps reserve allocation
before some point of no return.

256KB already takes up a lot of RVM, and I don't think any of it is
reclaimed when directory entries are deleted until the directory itself
is removed. Should we keep storing directories in RVM, or move the
storage to on-disk container files just like the data for regular files?

There are currently 3 directory formats in use, and they pop up all over
the place.

The server stores directories in RVM, as a fixed size array of possibly
allocated 2KB pages. This is probably done both for legacy reasons and
to avoid using realloc and memory fragmentation, the RVM allocator isn't
very good at merging fragmented memory. There is also a small hash table
to improve lookups. (1st format)

When modifications are made to a directory on the server the whole
structure is copied to regular memory. Once all operations have
succeeded it is copied back into RVM. (still 1st format)

When a client request a directory, a contiguous chunk of memory is
allocated, the contents of all pages is copied into this and sent over
to the client. (2nd format)

The client then converts the received directory data back to an array of
pages in RVM for storage and local manipulation. (1st format again)

When userspace opens the directory for reading, the client writes the
directory data to a container file in a BSD-style directory format (3rd

So the 1st format is used to store and manipulate the directory.
The 2nd format is the on-the-wire version. And the 3rd format is used to
pass the directory data to the kernel. And there is a lot of copying
going on.

We could use the BSD-style format all the way, in which case we would
only need to realloc RVM memory (or grow the container file). It
probably complicates the direntry create/delete code and we would lose
the hash based directory lookup. However the client can trivially drop
the received directory data into the container file and won't have to
do anything special when the kernel opens it for reading.

If we want to retain the hash-based (or use a tree-based) lookup,
extending the BSD-style format would work, but then the client has to
either munge the directory before passing it to the kernel, or all the
kernel code needs to be taught the new directory structure.

In the end we'd still need some of the code to read/convert the array of
pages format, otherwise we can't restore volume dumps (backups).

Received on 2001-05-17 10:10:10