Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Sat, 6 Nov 2010 12:38:10 -0400

On Thu, Nov 04, 2010 at 10:04:35PM +0100, u-codalist-f7q1_at_aetey.se wrote:
> I would like to revisit the old discussion about the dependency of Coda
> on the servers' static IPv4 addresses. It is mentioned among others in
...
> - let us make the presence of DNS SRV records mandatory (no big deal
> nowadays) and postulate that _all_ of the realm servers be present there,

 - SRV records are not really supported everywhere. For one, as far as I
   know, our cs.cmu domain still doesn't support them and my home router
   has an older dnsmasq that definitely doesn't, etc.
 - If you were really serious you probably should also require DNSsec.
 - And for the really, really serious use, publish not just Coda/realm
   servers but also use DNS for the Coda volume name to server mapping,
   i.e. publish the existing VRDB/VLDB data through DNS.

But I don't see how this helps for the static IPv4 case at all, we
aren't really having trouble finding servers.

> - a client even currently fetches server addresses for a realm at the
> first access to the realm, this information is "mostly static", it can
> be also unambiguously ordered - let the client cache this information,
> including port numbers, per realm, until shutdown, refresh it on the
> first access after the next startup, failing this - go disconnected;

That is what the client mostly does, no server data is persistent, when
we start up we only have references to the ipv4 address in a few places
(mostly volume data structures). Besides, I have a client on my laptop
that hardly ever actually shuts down, it gets suspended/resumed along
with the OS, but I only reboot if I need to run a new kernel. DNS
caching lifetimes are very different from Coda client lifetimes.

> - when the client resolves a volume location and expects to receive a
> server's IPv4 ip number, the volume location service would provide not
> an address but an index into the realm's servers' names/addresses array,
> the index will comfortably fit into the available 4 bytes;

This is not necessary, a lot of thought and work has already gone into
this and you have an alternate solution for parts that are already
solved but it does not address the hard parts.

Let me try to explain what we have and what (in my mind) still has to be
done.

done:
 - We have a 'new' RPC2 call, 'ViceGetVolumeLocation', which was added
   around the end of 2006 and has been present since coda-6.9.1. This
   call, when given a volume replica id will return the name (and
   optionally port) of the server as an ascii string. The returned
   string is actually read from /vice/db/servers, i.e. the same as what
   the server itself resolves when it tries to discover the IPv4
   addresses of all realm servers.

needed:
 + After obtaining the list of volume replicas that exist for a given
   replicated volume, the client should use ViceGetVolumeLocation to
   obtain the server names that host each replica.
 + Instead of allocating a non-persistent datastructure for server
   information the client should persistently store the server's name in
   a server specific struct, the existing places where we store an IPv4
   address should be changed to store a pointer to this structure.
 + The client can use DNS to resolve the server name and store the
   returned addresses (addrinfo) and iterate over them when trying to
   contact a server.
 + To avoid blocking the client, resolution should be performed by
   either a pthreaded helper process (harder to implement in a reliable
   cross platform fashion), or by using an asynchronous DNS library
   (last time I checked I didn't find an appropriately licensed version
   to be integrated into Coda or preferably RPC2).

I think there will still be a few surprises, but apart from the
GetVolumeInfo there are really not that many RPC operations that pass
around IPv4 addresses and RPC2 is already able to work with IPv6. This
approach also correctly deals with multihomed servers as well as
differences when resolving a DNS name on an internal vs. external
network. Finally in the extreme case someone should be able to put a
single ascii formatted ipv4/ipv6 address for a server in the
/vice/db/servers file if he has a static address but cannot get a
resolvable hostname for each server.

> It is quite possible that I am missing something of importance
> but in my eyes this would work, wouldn't need a heavy rewrite

The real problems are in the reverse direction. Servers have no reliable
way to know which incoming client connections represent the same client.
So the server doesn't know if a new callback connection is necessary so
every time a client uses a different address to connect to the server,
the server has to treat it as a new client. This can lead to excessive
cache revalidation and long timeouts when old callback connections have
to be cleaned up (similar to what we're seeing with NAT gateways).

Using the same address doesn't necessarily mean it is the same context
either, a server may have multiple names/realms that could lead to it
and as it doesn't know the context callbacks may not sent to every
cached copy of a file on a client.

An approach, which should solve both of these cases was to log
individual file callbacks on the server to provide a fast alternative
for clients to handle short disconnections and use either client
polling, or at most send only volume level callbacks, and with fewer
callbacks we can send them more aggressive, i.e. send a volume level
callback for each incoming RPC2 connection that we know has cached data
for the volume, but when then don't have to send any further until the
connection is used for revalidation.

> - a server changing its ip address will need all its clients to _restart_
>   instead of _reinit_ (nothing short of reinit helps me today...);

Technically a restart should do it, because IPv4 addresses should only
be persistently stored in volume information which should be revalidated
when we reconnect. However there are either more places that store the
IPv4 address (realm servers datastructure?) or we have an optimization
somewhere where cached volume location information is not revalidated.

> Of course, refreshing the result of DNS resolution of the server name
> list without taking the client down would make it even more attractive
> and make a possible server move fully transparent for the clients.

That is very hard to do, even playing with very short DNS timeouts,
there will be some period where some client would still use one address
and other already use the new address. Maybe if a server could have both
addresses for some time would make such a move truely transparent.

Jan

Coda File System

Re: revisiting change of server ip numbers / ipv6