Coda File System

Re: codasrv gets stuck

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 5 Dec 2003 17:06:10 -0500
On Tue, Dec 02, 2003 at 08:25:05AM -0800, Steve Simitzis wrote:
> the problem is that codasrv will freeze, apparently unbind all its
> connections, and refuse to do much of anything. the only way to get it
> running again is to kill -9 codasrv, and restart everything.

I've seen similar freezes on our testserver and attributed those to
clients that are connecting from behind a masquerading firewall without
lowering the server-probe timeout.

The problem is that the netfilter/iptables UDP connection tracking
forgets about forwarded ports within 3 minutes, but the normal server
probe is only about once every 5 minutes. So each probe sets up a bunch
of new connections from a new port when it revalidates the local cache.

The server isn't very smart yet, and tracks a client based on the
ip-address. So over time it builds up more and more RPC2 connection
endpoints, but because some of these connections have always recently
been used it never expires them. After a couple of days (weeks) it
spends so much time looking for a matching connection endpoint for each
incoming packet that the server seems to freeze. This disconnected any
clients with pending operations, and they reconnect, only making the
problem worse.

This is my current 'theory' about what is causing this. A server
restart clearly fixes it for a while because that we we get rid of all
those 'dead' endpoints. Another solution is to pull the network wire for
about 10 minutes :)

I'm not yet sure where to 'attack' this problem. For one, the server
should become a little smarter about tracking clients and which
connections belong to them/are still active. But maybe rpc2 has a
exponential growth problem in the lookup path where it is matching
incoming packets.

Jan
Received on 2003-12-05 17:07:47