Coda File System

Re: request for comments - proper addressing of servers in a realm

From: Jan Harkes <>
Date: Sun, 27 Jul 2014 09:33:21 -0400
On Sun, Jul 27, 2014 at 02:09:13PM +0200, wrote:
> Now I mean to let DNS handle an extra and natural task, to provide and
> maintain the client's knowledge of the servers' endpoints for the file
> service (until now DNS was only used to discover the volume location
> service).
> rfc2782 implies that a client is seeking to contact _one_ of the
> service instances to be located, while a Coda client is meant to contact
> _multiple_ servers in a realm at the same time.
> This means that our usage of the SRV records does not and can not
> follow the rfc2782 literally because the assumptions in the rfc are
> not applicable.

Actually they are because we only need one volume location server and it
will tell us who the other servers are.

I already made some steps towards handling ipv6 that are integrated in
the codebase. The stickiest problem were the client/server RPC2 messages
specifically the 'ViceGetVolumeInfo' ones, I added a new RPC
(ViceGetVolumeLocation) a long time ago that returns a string, which is
expected to be a fully qualified hostname that can be resolved to the
proper ip-addresses by the client.

One server can validly have more than one address, and mapping from
server name to the actual address should be done through DNS. I don't
think introducing an alternate namespace based on what is really sort of
an internal artifact is a good idea.

AFAIK, the server identifier in the volume identifier is really only
used on the servers to avoid a slightly more expensive hashtable (or
volume location server) lookup to figure which server should handle that
volume, to simplify volume identification for administrators (I know
every volume on server X starts with 'c') and to avoid conflicting
volume numbers on different servers.  But it also introduces problems,
we can only ever have 253 servers in a Coda realm, there are magic
server identifiers, 0, 127 and 255 that have to be avoided, etc.

> This implies a necessity to put all of a realm's servers into SRV
> but this hardly looks like a problem (until now it was actually sufficient
> to put a subset there, but is there anybody who did not put all servers
> there? if so, then why?).

Our domain doesn't even support SRV records, which is why we use
/etc/coda/realms. And no, we do not have all servers listed there
either, and having to push out a new Coda client to everyone just
because we retire/replace hardware is not an option.

And even if CMU's DNS would support SRV records, pushing an update is a
non-trivial step because it is managed by the facilities networking
group and coordinating a planned server replacement would involve
reducing record expiration time, and then in a short timewindow
replacing both the server and the SRV record. Unplanned would probably
be worse.

> DNS record size limitation might be a potential problem for realms with
> exceptionally many servers. This is not a practical problem today but
> may become one. Nevertheless I am pretty sure that this can be addressed
> when the need arises.

Placing everything in a single DNS SRV record is asking to be abused for
DDoS amplification attacks too, although there probably are not enough
Coda realms to put a dent in anyones network.

> An inconvenience related to DNS is its synchronous nature. I intend in
> the beginning to accept that the client will stall during DNS lookups
> (which is expected to happen rarely), this can be later replaced by an
> asynchronous layer.

That would probably make the callback break woes worse.
> - possibility to freely move Coda servers both from one ip-number
>   to another (allows among others servers with dynamic ip addresses)
>   and from one f.q.d.n to another (also desirable in practice)

This has little to do with how the addresses are resolved and more with
how the client does not track when it should revalidate the address.

> - possibility to put multiple Coda-servers (also for multiple
>   realms) behind a single NAT, using different port numbers
>   to distinguish between them

Forgot to mention, the returned string from ViceGetVolumeLocation is
expected to be 'server:port'.

Received on 2014-07-27 09:33:28