Coda File System

Re: Behaviour of coda with large files

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 6 Mar 2006 12:32:13 -0500
On Mon, Mar 06, 2006 at 12:00:51PM -0500, Martin Ginkel wrote:
> Jan Harkes wrote:
> >However backfetches are using the same connection, and your backfetch is
> >taking very long. So the server is unable to send the ping back to the
> >client. This shouldn't be a problem because the server should be
> >responding with RPC2_BUSY which will make the client wait an extra 15
> >seconds or so. I guess at some point the client did give up, returned
> >ETIMEDOUT and disconnected.
> 
> Hmm. How is this RPC2_BUSY supposed to work.

After the client sends a request, it starts to wait for a reply. If we
haven't seen the reply after the estimated round trip time period, we
assume that either the request or the reply packet was lost and the
request is retransmitted (wait time is exponentially increased). This is
done up to 5 times.

If the request was lost, the server will see one of the retransmissions.
If the reply was lost, the server will notice that the connection
already handled that request and it will retransmit the reply. Up to
this point everything is pretty predictable.

But if the server is still processing the request, we don't actually
have anything to retransmit, so the server sends back an RPC2_BUSY
reponse to let the client know that it did get the request and it is
still working on it. All of the retransmit/busy handling happens at the
lowest RPC2 layer on the server, the 'socketlistener' thread. So as long
as the server is handling incoming packets, it should be able to
respond.

When the client receives an RPC2_BUSY reponse, it immediately bumps the
retransmission period to the maximum RPC2 timeout value, as we know the
request got there and the reply isn't ready yet. If that times out we
resend the request, and get either a retransmitted response, or another
BUSY.

> Should this ObtainWriteLock somehow timeout? Which thread should
> send the BUSY reply?
> As far as I can see, the client sends the Probe several times on
> the RPC2 level, without getting *any* reply.

Well, the first probe request probably got stuck on that lock, but at
that point the connection state should be S_PROCESS, and any new packets
received by the server on the same connections should automatically get
an RPC2_BUSY response. The only thing I can think of is that it is
sending the busy to a bad address, but I would think that the reponse
have been sent to the wrong place as well.

> >This is actually not a reintegration write lock, this is caused by the
> >fact that there is only a single RPC2 connection from the server to the
> >client, so it can only do one thing at a time. Fetch a file, or send a
> >callback probe.
> 
> You mentioned, that clients should break up their store OPs.
> Can they break up the transfer-size below file-size?
> Or do they have to transmit at least one file completely.
> I think this matters for the ISO-Images.

It happens right before reintegration. If the first CML entry is a large
store, it will be send in smaller chunks. Then when we reintegrate we
add a handle to the store log entry, which makes the server avoid the
backfetch and use the previously sent data.

Jan
Received on 2006-03-06 12:33:12