Coda File System

From: <u-codalist-f7q1_at_aetey.se> Date: Sat, 6 Nov 2010 23:23:48 +0100

Hello Jan,

thanks for your reply!

On Sat, Nov 06, 2010 at 12:38:10PM -0400, Jan Harkes wrote:
> > - let us make the presence of DNS SRV records mandatory (no big deal
> > nowadays) and postulate that _all_ of the realm servers be present there,
> 
>  - SRV records are not really supported everywhere. For one, as far as I
>    know, our cs.cmu domain still doesn't support them and my home router
>    has an older dnsmasq that definitely doesn't, etc.

:(
[I used to bypass my previous Coyote-based router at home for DNS queries
for this very reason. Replaced it with a modern one, ca $30, it takes
much less space and electricity, no problems with SRV...]

An absence of DNS SRV records means only that such realms would not be
able to do some nifty things, but no worse than they have today...

>  - If you were really serious you probably should also require [...]

I back out about abusing DNS, an RPC solves the problem better as Greg
also pointed out.

> But I don't see how this helps for the static IPv4 case at all, we
> aren't really having trouble finding servers.

We have shortages of static addresses. Even given the static ones, we do
not in practice own any which is guaranteed to be indefinitely permanent.
Changing them today postulates manual intervention (reinit) on _each_
client. We do not and can not administrate the client hosts.
For comparison, may a web shop assume to administrate its customers'
computers? :)

> DNS
> caching lifetimes are very different from Coda client lifetimes.

The data I am concerned about is much more persistent than DNS tends
to be. I am trying to solve "my server has got a differing address from
the ISP, the second time since last year" kind of situation.

> > the index will comfortably fit into the available 4 bytes;
> 
> This is not necessary, a lot of thought and work has already gone into
> this and you have an alternate solution for parts that are already
> solved but it does not address the hard parts.

I am trying to avoid changing the code besides in few places
and reuse the existing data structures (the places where the server
IP addresses are transmitted and stored) - by replacing the addresses
by indeces into an array which is maintained separately.

> done:
>  - We have a 'new' RPC2 call, 'ViceGetVolumeLocation', which was added
>    around the end of 2006 and has been present since coda-6.9.1. This
>    call, when given a volume replica id will return the name (and
>    optionally port) of the server as an ascii string. The returned

Nice. I must have forgotten it, you surely mentioned this before.

>    string is actually read from /vice/db/servers, i.e. the same as what

Wonder where it takes the port number from? It seems that we would need
to have the port numbers in the "servers" file, were there no assumption
of always using the standard ports. I would _love_ using non-standard
port when necessary (multiple servers / NATs).

> needed:
>  + After obtaining the list of volume replicas that exist for a given
>    replicated volume, the client should use ViceGetVolumeLocation to
>    obtain the server names that host each replica.
>  + Instead of allocating a non-persistent datastructure for server
>    information the client should persistently store the server's name in
>    a server specific struct, the existing places where we store an IPv4
>    address should be changed to store a pointer to this structure.

Do we really need to store all server information persistently?
In my eyes the only persitent data about a server is its server id
(today 8 bits). Of course it implies that we have a persistent
realm object which I assume we do and that a volume "belongs" to a realm.

I wonder if we could make less changes to the code than you anticipate?

>  + The client can use DNS to resolve the server name and store the
>    returned addresses (addrinfo) and iterate over them when trying to
>    contact a server.
>  + To avoid blocking the client, resolution should be performed by
>    either a pthreaded helper process (harder to implement in a reliable
>    cross platform fashion), or by using an asynchronous DNS library
>    (last time I checked I didn't find an appropriately licensed version
>    to be integrated into Coda or preferably RPC2).

How does the client avoid stalls today when a new realm is being
contacted? I guess I would happily accept a limited number of additional
synchronous DNS resolve operations.

> Finally in the extreme case someone should be able to put a
> single ascii formatted ipv4/ipv6 address for a server in the
> /vice/db/servers file if he has a static address but cannot get a
> resolvable hostname for each server.

Yeah, we can think of the numeric addresses as being a special form of
hostnames which always resolves to the corresponding number :)

> The real problems are in the reverse direction. Servers have no reliable
> way to know which incoming client connections represent the same client.
> So the server doesn't know if a new callback connection is necessary so
> every time a client uses a different address to connect to the server,
> the server has to treat it as a new client. This can lead to excessive
> cache revalidation and long timeouts when old callback connections have
> to be cleaned up (similar to what we're seeing with NAT gateways).

I did not try to solve this problem, when I mention NAT in this thread
I mean multiple servers behind a NAT.

> Using the same address doesn't necessarily mean it is the same context
> either, a server may have multiple names/realms that could lead to it
> and as it doesn't know the context callbacks may not sent to every
> cached copy of a file on a client.

I see.

> An approach, which should solve both of these cases was to log
> individual file callbacks on the server to provide a fast alternative
> for clients to handle short disconnections and use either client
> polling, or at most send only volume level callbacks, and with fewer
> callbacks we can send them more aggressive, i.e. send a volume level
> callback for each incoming RPC2 connection that we know has cached data
> for the volume, but when then don't have to send any further until the
> connection is used for revalidation.

I have for a long time been sceptical of callbacks, so yes, a change to
polling or something else would be very, very welcome - but hopefully
this kind of problems can be solved separately from the "fast IPv4
numbers" one?

> > Of course, refreshing the result of DNS resolution of the server name
> > list without taking the client down would make it even more attractive
> > and make a possible server move fully transparent for the clients.
> 
> That is very hard to do, even playing with very short DNS timeouts,
> there will be some period where some client would still use one address
> and other already use the new address. Maybe if a server could have both
> addresses for some time would make such a move truely transparent.

Well, I exaggerated about "fully transparent". Clients timing out and
going disconnected and then some-reasonable-time-later back connected
would feel "quite gracefully" and this is what I meant.

Thanks again Jan,
hope my reasoning is not totally off (I am not looking at the code).

If there is a possibility to avoid the current presence of IPv4 addresses
on the wire and among the persistent data, without radical changes in the
code - it is definitely worth doing! Opening for explicit port choices
for the servers would allow important practical setups as well...

Regards,
Rune

Coda File System

Re: revisiting change of server ip numbers / ipv6