Coda File System

Re: SpoolVMLogRecord - no space left in volume

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 6 Jun 2001 10:02:23 -0400
On Wed, Jun 06, 2001 at 06:37:57AM -0700, Ed Kuo wrote:
> 
> Hello
> 
> I have encounter a problem of similar situation with
> "Big Server" mail in codadev mail list.
> 
> After error in making mozilla.....
> ...
> [chris_at_cluster1 /coda]# mkdir 12345
> mkdir: cannot create directory `12345': No space left
> on device

Totally different error, but you're on the right track.

> [chris_at_cluster1 /coda]# volutil setlogparms 0x1000002
> reson 4 logsize 16384
> V_BindToServer: binding to host cluster1

Correct, the server ran out of resolution log entries. These are still
used by singly replicated volumes. But they are not thrown out when the
COP2 message is missing (second phase of the 2-phase commit), and there
is never a reason for resolution, so they tend to hang around forever.

I've tried to add a hack to createvol_rep that disables resolution for
newly created singly replicated volumes. However, it doesn't seem to
work, basically trying to do 'volutil setlogparms <newvolume> reson 0'.

The volutil command might have failed because the server died when it
ran out of reslog entries and wasn't restarted.

> [chris_at_cluster1 /coda]#
> (There are totally about 4000 directories under
> mozilla tree)

In a tree is no problem, it is the 4000-7000 files in a single directory
that Coda doesn't handle.

The low limit on the number of directory entries is only really a
problem in a few cases (my maildir format email directories, or Greg's
RFC mirror). The current directory format isn't that useful for
directories with many entries anyways. Coda uses a simple 128 bucket
hash for directory lookups. With +/- 7000 entries, every hash-chain has
an average length of about 54 entries, so IMHO lookup performance is
already staring to become pretty bad around this point

> My coda configuration is:
> cluster1:SCM, cluster3:non-SCM
> Only root volume:coda.root is provided, /vice:300M /vicepa:10G
> created by: "createvol_rep coda.root E0000100 /vicepa"
> after codasrvs on both cluster1 and cluster3 started up 
> and /vice/db/servers and /vice/db/VSGDB modified.

Hmm, if this is really a replicated volume, there must have been some
network flakyness that kept the servers out of sync. The crashed server
has over 4000 operations of which it didn't know whether they reached
the second machine. And clients didn't detect any differences between
the replicas because that would have triggered resolution which would
have truncated the resolution logs.

> (originally it was RvmLog/RvmData:30M/315M before
> enlarged trying to solve the no-space-left error)

Rvm log really doesn't have to be that large, our server typically run
with a log of between 2MB and 6MB, the log is only used to record
on-going transactions. The servers tend to apply logged modifications to
the data segment pretty often.

> I wonder if there is anything not set well. Any
> suggestion? Or it is some limitation problem?
> 
> Helps are greatly appreciated.

First thing would be to extend the resolution log size like you were
trying to do using 'volutil setlogparms reson 4 logsize 16384'.

Then, on a client, run 'cfs cs ; cfs strong ; ls -lR /coda'. This should
trigger resolution for the parts of the tree that are out of sync between
the servers.


On Wed, Jun 06, 2001 at 06:37:57AM -0700, Ed Kuo wrote:
> Chris

Identity crisis? ;)

Jan
Received on 2001-06-06 10:02:25