Coda File System

Re: Behaviour of coda with large files

From: Jan Harkes <>
Date: Wed, 1 Mar 2006 21:37:46 -0500
On Mon, Feb 27, 2006 at 01:39:30PM +0100, Martin Ginkel wrote:
> I am using a simple coda-setup and I am trying to store some larger
> files (ISO-Images) >500M < 2G  on a volume.
> The client and server are are different machines and the server is
> non-replicated.
> IMHO strange things happen:
> If I copy a file on the client into coda, the client first writes to
> the local cache and then starts transmitting the file contents, when
> the local op has finished.
> OK, normal. The transmission is quite slow:
> It does by far not use the available bandwidth
> (100Mbit net, coda write traffic uses up to ~7 Mbit).

I wouldn't expect that, to a single server my client does about 3MB/s
(~30Mbit) when it writes to a single server, goes up to 6MB/s for a
doubly replicated group. For triply replicated we don't scale linearly
anymore, I only get between 7-8MB/s.

Ofcourse we run everything from userspace, so the file transfer is
pretty CPU intensive and as such somewhat sensitive to other workloads
on the system.

I once had reports from people who tried to use Coda on a GB/s network
that they were seeing atrocious performance. However I have never been
able to reproduce that, we switched to GigE in our lab and I have not
had any problems with it.

> 1) First odacon reports something about a store operation.
> If this finishes successfully, everything is fine.
> If something (like a disconnect) disturbs this "store", the file will 
> apparently never reach the server (see below). During this longer store
> operation (about 15min) the server responds normally to Probe calls and 
> seems to be alive to the clients.

Ehh, disconnections should not happen that easily, we've bumped the RPC2
timeout to a generous 60 seconds. Is there a bad cable/router/switch in
your network that is consistently dropping packets?

Of course the store doesn't reach the servers when we disconnect. We
also realize that few applications actually check for errors on close,
so the store operation is logged in the CML and when the servers return
it will be reintegrated.

> 2) After some disconnect within the store the client will
> eventually reconnect to the server and start a reintegration
> (reported in codacon). Then a 'backfetch' is reported.

Reintegration probably locks the volume during the complete process,
store might only grab the lock for a short time after the data transfer
has completed. I haven't looked at the code so I might be wrong.

> The speed is the same as above.
> When this operation starts, the server becomes *unreachable* to 'Probe' 
> attempts from any client!

That is unusual, since probes are not for a specific volume, if the
server is not responding to Probe calls, the server is ignoring all
RPC2 calls. Backfetches are sent from server to client, I believe the
client is the active side that does the retransmissions and such. Not
sure why/how it could continue if RPC2 calls are failing.

> The transmitted data is then never reintegrated into the volume on the 
> server, but after sending the whole file, the server will be reachable 
> again, and restart the reintegration, when the respective volume is hit
> on the client. Again and again ...

There is an alternative reintegration method, which is used when the
observed bandwidth is low. Effectively there is a 'hogtime' parameter
which defines how long a client should keep the link/server busy. I
think the default is 30 seconds. The client uses the estimated bandwidth
to figure out how long a file transfer will take. If it takes to long
the file is first sent to the server in smaller fragments which are
finally followed by a reintegration of the associated store operation.

> Questions:
> a) I guess this is not wanted?
>    Why is backfetch blocking the other calls and store is different?

Reintegration does a lot more than just storing data so there might very
well be more constraints on what type of locks it needs on the volume.
In general reintegration is a lot more efficient because it can perform
up to 100 operations in a single transaction.

> b) Is there a way to improve the performance. What is the bottleneck?
>    Are there tuning parameters? Or is it just due to RVM?

I think the bottleneck is in the network, SFTP is somewhat bursty and
will send between 8 and 32 packets at a time. It doesn't adapt as
smoothly as TCP, and some router or switch may not be able to queue
enough packets. If you combine that with always dropping the last
packets that arrive, then the server doesn't see the 'ackme' flagged
packet. So the client retransmits the window (this time setting the
'ackme' flag on the first packet), but this only adds more congestion to
the switch.

This would explain why servers aren't even seeing any of the 5 retries
of the Probe RCP2 call (or maybe they do, but the client clearly doesn't
get the response).

Received on 2006-03-01 21:39:21