Coda File System

Re: Server Crash Post Mortem

From: Jan Harkes <>
Date: Thu, 4 Sep 2003 13:42:13 -0400
On Mon, Sep 01, 2003 at 12:22:18PM +0200, Steffen Neumann wrote:
> 	- 10:15:45 random snippet with warnings that sound chinese to me
>         10:15:46 CheckRemoveSemantics: 1000004.5.3, VCP error (198)

I typically associate VCP error 198 with a client that is reintegrating
or repairing something when it has stale data in the local cache. I'm
having this suspicion that that has become somewhat more common ever
since the '..' traversal fixes went into 6.0.1. Possibly something went
wrong with refcounting and objects aren't purged or refetched as they
should in all cases.

In this case it looks like this client is trying to remove and create
files in a directory with the fid 1000004.5.3, but it has a stale copy
of the directory in the local cache.

This could be repair or reintegration related.

>         10:16:00 VGetVnode: vnode 1000004.dda8 is not allocated

These are typically resolution related, triggered after the previous
repair or reintegration attempt. However the parent directory needs to
be resolved and the client seems to be intent on not triggering
resolution on an obviously stale directory. As it never revalidates the
directory it doesn't see that the conflict it is trying to resolve is
really somewhere else.

>         11:55:14 VShutdown:  shutting down on-line volumes...

That looks like a voluntary shutdown. i.e. someone or something sent
the codasrv process a SIGKILL or SIGTERM, or used 'volutil shutdown'. I
don't see any indication of a fatal crash.

>         11:55:14 VShutdown: Taking volume offline...
>         11:56:25 Callback failed RPC2_NAKED (F) for ws

Some lock (resolution?) is still held on this volume and the server is
waiting for it to be released. If in such a situation it looks like the
lock isn't getting released anytime soon a kill -9 might be in order to
really bring the server down.

