Coda File System

Re: write failure issues

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 14 May 2004 14:02:11 -0400
On Fri, May 14, 2004 at 12:18:09PM -0400, shivers_at_cc.gatech.edu wrote:
> Things go wrong when I just say on ClientW
>     cp <12-big-files> /coda/Server/shivers/.
> The system writes about three files, then things get screwy & disconnected.
> What happens, I *think*, is that the write goes in two stages: it's written
> into the venus cache quickly, then dribbles out over my cable modem slowly.

Ahh, cable modem, asynchronous network... I don't have DSL or cable
myself and Coda used to only work reliably on networks that had
identical up and download speeds.

What happens is that during the fetches we see an amazingly fast
network, but we time out and get disconnected as soon as we try to write
even a little bit of data because the acks are taking far too long. RPC2
'thinks' we have a 3MB/s sync network, so when sending several KB and
not seeing the ack within a couple of milliseconds it believes the
packet got lost and retransmits. This only makes the congestion on the
uplink even worse. Once we hit about 5 retransmissions and haven't yet
seen the ACK message, the client gives up and disconnects from the
server.

A couple of the CMU grad students that are using Coda here got DSL
(384Kb down/128Kb up?) and complained enough for me to try to (blindly)
fix it. RPC2 now _tries_ to estimate uplink and downlink speeds
independently. It mostly solved the problems for them, but as I really
don't have anything to test it with I'm pretty sure this wasn't a
perfect solution.

btw. TCP probably has similar issues if you use a persistent connection
and first fetch a lot of data for a while and then try to push back
data, it is just that typical use is either one-directional, or is
started by the client behind the DSL by sending a request on the slow
uplink and then getting a response from the downlink. Ramping up in
speed is no problem, it is the sudden degradation that bites. TCP is
just a bit more tenacious and just backing off more instead of giving up
entirely after 5 retries.

> When this bottleneck causes enough reintegration data to build up, blammo.
> The lossage is as I described in my last message: cfs lv shows the system
> in some kind of disconnected state, and cfs wr won't make it reconnect.
> 
> So the message seems to be that if I don't press the system hard, it works.
> Under pressure, it falls over. For me, that's progress. Now I want to
> understand the current hosage. Can anyone help?

Well, one thing is that your connection really is 'weak' in Coda's
terms. The uplink speed is probably in the order of 64 or 128Kb/s, so it
prefers to work write-disconnected. You can tell it not to adapt to
network bandwidth estimated by using 'cfs strong'. This should prevent
the (connected -> write-disconnected) transition. However you can still
become write-disconnected because of the (connected -> disconnected ->
write-disconnected) transition, in other words if RPC2 misses the bat
and times out you end up logging the change and won't automatically
return to connected state when we notice that the server hasn't really
gone.

The reason your client isn't reintegrating is either because the pending
changes haven't 'aged' long enough. Statistically, any file that hasn't
been removed within 5 or 10 minutes after creation, it is likely going
to be around for several months. So a lot of bandwidth is saved by
delaying reintegration long enough so that short lived (temporary) files
can be optimized away locally.

The other reason could be that the estimated bandwidth is so incredibly
low, that the client thinks it can't even reintegrate a single record
without blocking the user for a significant amount of time. I believe
the formula was something like, size of reintegration / bandwidth has to
be less than 15 seconds. The low bandwidth estimate would be caused by
RPC2's own insistence on retransmitting 'lost' packets, if every packet
is sent 4 or 5 times, these all eat up the available link bandwidth.
128Kb/s would end up looking more like 32Kb/s (4KB/s) which really is a
trickle.

Jan
Received on 2004-05-14 14:05:13