Coda File System

Why is there a /vice/db/servers file?

All of the stuff inside the codasrv process doesn't directly use ip-addresses, but uses the 'serverid'. The servers file is simply the mapping from serverid to servername. To make it even more difficult, DNS lookups are blocking and would block the whole process (all LWP threads), so we actually compare everything by ip-address and the servernames in the 'servers' file are only looked up during codasrv startup.

The in memory representation of the servers file has just the serverid and the _first_ ip-address that is returned by gethostbyname. This is why coda servers have problems on multihomed hosts that use multiple ip-addresses on different interfaces, we simply don't handle more than one address.

Similarily the VLDB and VRDB files (volume location and replication information) store the serverid's and not hostnames or ip-addresses. When handling the ViceGetVolumeInfo call the server pulls the requested information out of the VLDB or VRDB file and replaces the serverid's with the server's ip-address that was resolved during server startup. Here we also are unable to return a list of addresses for a server, even if we could store it. The correct thing to return would have been the servername, because then the client could resolve the name and get the right ip-address to connect to the server from the client's perspective, not the server's perspective.

As the VRDB and VLDB files are created by the SCM, this all depends greatly on whether the SCM managed to correctly resolve all servers in the servers file to ip-addresses, any failed resolves will result in a 0.0.0.0 address (I believe Shafeeq added a test to 5.3.17 to block a server from starting up when that happens).

This is the reason why all servers need to be restarted after any changes have been made to the /vice/db/servers or /vice/db/VSGDB files.

When there are 2 servers with the same serverid in the "servers" file, both 'replicas' are pointing to the same machine after the volume was created (although both replicas are created on different machines). Similarily when the same server is listed twice with different id's some parts of the code might find the first serverid, while others find the second serverid, leading to considerable confusion about volume location (I've seen both cases happen).

I'm not entirely sure what happens when a server is not found in the servers file, but the resulting serverid will either be that of the last server in the servers file, or be pretty much undefined.