Coda File System

Re: venus: can't find root volume name

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 31 Oct 2000 08:48:32 -0500
On Tue, Oct 31, 2000 at 09:44:24AM +0900, Stephen J. Turnbull wrote:
> >>>>> "Jan" == Jan Harkes <jaharkes_at_cs.cmu.edu> writes:
> 
>     Jan> Find servers on startup problems are almost invariably a
>     Jan> result of missing entries in /etc/services, broken name
>     Jan> resolution, or occasionally operator error (oops, forgot to
>     Jan> plug in the network cable).
> 
> You forgot "upgrading to Coda 5.3.9 or higher".
> 
> I admit that I upgrade my Debian unstable Linux distro regularly, and
> Debian unstable unfortunately displays monotonically increasing
> entropy.

How `IPv6'-ready are your hosts? I know that Debian has set a goal to
push for full IPv6 support in the next release. When you add IPv6
addresses for your hosts, gethostbyname returns the v6 ones. Coda
doesn't handle those correctly at the moment and simply uses the first
32 bits as if it were an IPv4 address.

Does anybody know the preferred way to get the v4 addresses when
available, instead of the v6 one.

> That said, Coda is the only application whose host finding behavior
> has broken recently, it broke across a Coda upgrade on a running
> system (no reboot), not a Debian update, and the problem seems common
> among many platforms.

Well, if glibc got upgraded earlier or the IPv6 addresses were added on
the server, but the VLDB was only rebuilt after the Coda upgrade.

> I hope you won't ignore Greg Troxel's hypotheses.

I don't ignore problems, in fact I've been scrutinizing the code. There
is only one place where the client uses gethostbyname and it is only
used during startup to resolve the names of the rootservers. If none of
those succeed an assertion is triggered.

The rootservers are then contacted for the name of the rootvolume, which
is the first RPC2 call a client makes. If it fails, the client displays
"Can't get rootvolume name" and keeps trying. This can be a result of
gethostbyname returning the wrong addresses, network connectivity
problems, or a not running or not-completely configured server.

Once the rootvolume name is obtained the rootservers are contacted
again for the volume location information, this comes directly out of
the VLDB. Sadly the VLDB contains a series of ip-addresses, one per
server and the name resolution was done _on the server_. That is why the
"127.0.0.1 localhost myhostname.domain" entry in /etc/hosts creates so
much havoc and why multihomed hosts (f.i. connected to both a private
and a public network are problematic).

> Even if it is my name service, I can't figure out what might be going
> wrong, as everything else works.

It is hard to tell as I don't see the problems occuring here. All I can
do right now is speculate and add more assertions to the code to catch
possible corner cases.

One big change in 5.3.9 was that we went from representing ip-addresses
in venus as host-order unsigned longs, to network-order struct in_addr.
But if anything is still wrong in that area, I should have seen the same
problems.

Jan
Received on 2000-10-31 08:54:23