The Coda Dir Package: Introduction

Next Previous Contents

Unix directories are files containing lists of filenames and file identifiers. In Coda the directory entries consist of a name and a DirFed, a pair of a vnode number and uniqifier. The volume to which a directory belongs can be found back through its vnode.

Directories are transported over the net, hence the storage format of all data is in network order. Therefore two variants of the Dir Fid structure exist, DirFid's for use on the host,and Dir NFid's which hold the identifiers innetwork order.

The size of a directory entry is always a multiple of 32 bytes. The first entry contains the fid, the blob number of the next entry on the hash chain and a tag indicating it is the first entry. Subsequent entries contain the remainder of a potentially long name.

1.2 Directory Format

The Coda directory format is inherited from AFS 2 and is somewhat elaborate. Directories have sizes equal to an integral multiple of the directory page size (2048 bytes). They are always continguously allocated (this was changed in 1998). Pages are then subdivided in 64 blobs. Each page in the directory has a PageHeader and a directory has a DirHeader.

The first blob contains the PageHeader, the next 12 blobs in the first page contain the directory header and the remaining blobs are for directory entries. The pageheader contains a bitmap to manage the allocation of blobs, and a free count to indicate how many blobs are still free. The DirHeader contains an allocation map (allomap), which also indicates for each possible page in the direcotry how many blobs are available. It also contains a hastable. Has values are coputed for names and indicate the first blob used by entries of that name.

1.3 Persistent Storage for Directories

Ultimately, clients and servers hold directory data in RVM.

Directories on the client

Venus will store directory data as a contiguous blob of memory and expand such a region when a directory needs to grow. A Venus fso object contains a field of type VenusData, which is a union of file, directory and symlink data.

union VenusData {
    int havedata;       /* generic test for null pointer (pretty gross, eh) */
    CacheFile *file;    /* VnodeType == File */
    VenusDirData *dir;  /* VnodeType == Directory */
    char *symlink;      /* VnodeType == SymbolicLink */
};

For a directory Venus has:

struct VenusDirData {
        struct DirHandle dh; /* contains pointer to Coda format directory */
        unsigned udcfvalid : 1; /*T*/   /* Unix format directory in UFS. */
        CacheFile *udcf;   /*T*/
        int padding;       /*T*/
};

Venus will at appropriate times write a Unix style directory into a container file, so that the kernel can perform its readdir operations directly on the container.

Server directory storage

Server directory vnodes point to directory inodes. Such inodes contain the pages of the directories. Since directories and directory inodes are copy-on-write objects with respect to cloning volumes, several vnodes (in different volumes) can point to a single directory inode.

The VnodeDiskObjectStruct in RVM contains a pointer Inode to a directory inode structure:

struct DirInode {
        void *di_pages[DIR_MAXPAGES];
        int  di_refcount;             /* for copy on write */
};

The server stores the directory in pages in RVM, but never does directory operations on the RVM pages. A directory inode contains the page map di_pages for the directory. The array contains pointers to RVM storage for the individual pages of the directory. When concatenated into a single buffer they are the objects Venus fetches from the server.

The server makes a copy of the directory data in VM, operates on that copy and copies the entire directory back to RVM when finished.

Since directory contents on the server are copy on write, a directory handle is needed to count the references to the directory while it is in use. These handles are held in a hash table in the server.

These arrange for RVM storage of directories.

While the server is running vnodes are hashed in VM and the Vnode structure contains a pointer dh to a directory handle. The directory handles themselves sit in a hashtable:

struct dllist_head dcache[DCSIZE];
struct dllist_head dfreelist; 

struct DCHashEntry {
        struct dllist_head    dc_listhead;
        int                   dc_count;  /*how many vnodes are referencing us */
        struct DirHandle      dc_dh;
        PDirInode             dc_pdi;
};

1.4 C Structures Describing the Directory Format

The following structures describe the directory format:

struct PageHeader {
    log tag;
    char freecount;   /* duplicated info: also in allomap */
    char freebitmap [EPP/81];
    char padding[PH_PADSIZE];
};

struct DirHeader {
    struct PageHeader dirh_ph;
    char dirth_allomap[MAXPAGES];  /* one byte per 2K page */  
    short dirh_hashTable[NHASH];
};

/* A file identifier in host order */
struct DirFid {
    long dnf_vnode;   /* file vnode */
    long dnf_unique;  /* file uniguifier */
};

/* File identifier in network order */
struct DirNFid {
    long dng_vnode; /* file vnode */
    long dnf_unique;  /* file uniguifier */
};

struct DirEntry {
    char flag;
    char length;  /* currently unused */
    short next;
    struct DirNFid fid;
    char name[16];
};

struct DirXEntry {
    char name[32];
};

1.5 Example of the Directory Layout:

The dirtest program shows a simple directorylayout below:

DIR: 0x501121c, LENGTH: 2048

HASH TABLE:
(1 19) (46 13) (68 14) (104 20)

ALLOMAP:
(0 41)

PAGEHEADERS;
page 0, tag 1234, freecount 41, st 23, bitmap:
1111111101111111111111100000000100000000000000000000000000000000

CHAINS:
Chain: 1
thisblob: 19 next: 18, flag 1 fid: (8.8) 27
thisblob: 18 next: 17, flag 1 fid: (8.8) 50
thisblob: 17 next: 0, flag 1 fid: (8.8) 225
Chain: 46
thisblob: 13 next: 0, flag 1 fid: (9.9) .
Chain: 68
thisblob: 14 next: 0, flag 1 fid: (9.9) ..
Chain: 104
thisblob: 20 next: 0, flag 1 fid: (4.5) this is a veryveryveryverylongname

Careful investigation will show that more than a single blob is ocupied by the very long entry.