Coda File System

Re: take 2 (request for comments - proper addressing of servers in a realm)

From: <>
Date: Tue, 29 Jul 2014 11:32:38 +0200
On Mon, Jul 28, 2014 at 10:04:04AM -0400, Jan Harkes wrote:
> > Currently step 1 is implemented via DNS but it does not seem to
> > provide for any data expiration/renewal (?) besides by manually
> > running cfs check<something> on a client (?)
> We iterate through the servers and redo the SRV lookup when we fail to
> connect to all of them.

Ok, looks good enough.

> > I suggest moving the step 2.3 to the client, to address the limitation
> > of the current design. The mapping in this step is not security-critical
> > and hence can be delegated to DNS.
> > 
> > Jan's intention was to split 2.3 into
> > 
> >        2.3.1 server-numeric-id => (f.q.d.n,numeric-port)
> >              inside the VLS
> > 
> >        2.3.2 f.q.d.n => numeric-ip
> >              on the clent, via DNS
> That is deceptive because currently all of step 2 is a single
> GetVolumeInfo RPC2 call. So you cannot really split out just 2.3, and
> then make my approach look more complicated by splitting your step
> 2.3 into 2 additional steps. Both approaches are really pretty similar
> in what is looked up.

I am sorry for looking deceptive. Yes of course they are similar,
the main difference being where the line between server and client goes.

> The volume location service has to track which replicas belong to a
> volume anyway, and the server has to know where replicas are located.

Do you mean the file server, for contacting the other file servers for
resolution? Hmm, yes, indeed.

I guess it is where the db/servers file may be useful (to avoid security
worries - otherwise resolving via DNS like a client would do).

> > - is it actually correct to let server-id be public information?
> Not really an issue


> it is more, is the namespace you want to use to
> talk about servers based on numbers that have somewhat odd properties
> like be 8-bit and avoid 0,127 and 255. Or use regular old DNS name
> resolution which can handle ipv4/ipv6, multi-homed hosts, internal vs.
> public addresses etc.

Hm? This odd namespace has no troubles (?) to be put into the SRV
priority field (if abusing the latter), or otherwise to be reflected
in _codasrv_N. I do not see any problem with, say, _codasrv_127 being
illegal. The realm admins are supposed to be in full control of this
name space, nobody else. This covers the different ip versions,
multihomed hosts and so on as well.

> > Is there a better approach?
> Devils advocate here, get rid of the volume location service completly
> and put everything into DNS with volumename.<realm> DNS records.

Let me try to take this seriously.

This _really_ would make the whole structure much simpler.

I have now basically written down a whole essay about how useful and
attractive a modification of such approach would be (to prevent data
leaking we would probably have to skip the volume names layer but could
refer to e.g. randomly generated volume ids).

Nevertheless it does not seem applicable because of one detail:

Can we accept leaking the volume ids?
An unfriendly party might e.g. intercept a certain client's DNS questions
and combine this with other knowledge to deduct which volumes carry which
information - and/or which information a certain client is accessing. This
can be pretty much sensitive. Today it is the servers a client is talking
to which are visible for an eavesdropper/ISP (same situation as for
the web). Leaking the volumes is making this worse.

On the other side, during the analysis I noticed several things which
actually make the life easier (contrary to the possible first impression)
even though they are not directly DNS-related.

One is to drop the volume names layer altogether
(ducking and covering my head).

A comment/name should be available in the volume metadata which probably
would suffice for the realm admins. To have to supply a volumeid for
cfs mkm is almost a no-brainer for a realm administrator who understands
what she does. Creating mountpoints by outsiders is discouraged anyway -
mountpoints refer to the internal realm data which _could_ change without
affecting the access over the "Posix-like" interface.

Doesn't this make sence?

Dropping a layer of indirection from the main codepath feels like a
certain win. Nothing prevents bookkeeping of the corresponding naming
separately, without treating such information as authoritative for the
purposes of the file service (it can be present for human's perusal
e.g. as an "arbitrary length comment", easily structurable/scriptable
independently, according to the tastes of the realm administrator).

(As an "ultimate solution" cfs could be made capable of talking
to scm to easily map between volume comments and volume ids, given
e.g. System:Administrators credentials. Btw scm is desirable in DNS
anyway, au should [be able to] use such a record to locate the scm.
What about _codascm._udp.<realm> ?)

In every place where the volume name is used it could be substituted
by a string representation of the numerical volume id.

Similarly, I guess it might be most convenient to have a single volume
id for all replicas of a volume (again, this means skipping an extra
indirection level which does not carry any relevant semantics).

Collisions can be prevented - volumes in practice even today are to be
created from the scm host (or you _really_ know what you are doing), so
this extra constraint (scm must participate in creation of all volumes)
is hardly new, and would be actually useful to enforce.

Making sense?

I will love to hear arguments against the above, to avoid spending time
on "seemingly attractive dead-ends". Comments, please?

Received on 2014-07-29 05:33:11