Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Fri, 10 Jan 2003 12:21:07 -0500

On Fri, Jan 10, 2003 at 02:24:45PM +0900, Stephen J. Turnbull wrote:
> >>>>> "Rod" == Rod Van Meter <Rod.VanMeter_at_nokia.com> writes:
>     >> Not sure what 198 is, but reintegration failed. The first entry
>     >> in /usr/coda/spool/500/developers_rdv@_coda_rdv.cml should be
>     >> the operation that is causing the problem.
> 
> I'm seeing this, too, with CVS workspaces in coda (CVS server is
> remote).  (All of Rod's description applies except for the details of
> file naming, including a few files with spaces in the name.  I don't
> think there are any commas.)  I sent you a coda log a week ago or so
> which probably has this problem in it.  Didn't you get it?

You mean the logs you sent tuesday? I've been looking at them, a lot
looks perfectly normal, but some of it puzzles me.

In any case 'errorcode 198' is EINCOMPATIBLE. It is returned when we're
trying to write (store) data to a file that has been modified, i.e. the
store-id of the original copy on the client doesn't match the store-id
of the file on the server.

Your server log should show a message similar to,
    CheckStoreSemantics: (0x7f00002a.0x76a.0x44d0), VCP error (198)

Looking at the server code, I just noticed that when resolution is
disabled, the code uses a stronger test that would validate the store-id
of directories. And we've been turning resolution off on newly created
volumes that have only a single replica because of some other problem.
It could be that this has made reintegration more susceptible to
failures. This is just one theory, but
    'volutil setlogparms <volume replica id> reson 4'
will turn resolution back on. 

Another interesting fact is that the first entry in your CML is a store.
Perhaps the client got disconnected during the connected store attempt,
and this is essentially a replay of an already committed operation. I
believe that I once noticed that operations were given a new store-id
when the code falls back to logging the operation in the CML after a
failed connected store operation.

>     Rod> Returns nothing.
> 
> Yup, me too.

That was what's puzzling me, we can clearly see the fid replacement in
the log,

[ I(21) : 0000 : 15:17:09 ] k_Replace: ViceFid (7f00002a.25f1.2c16) with ViceFid (ffffffff.ffffffff.ea) in mini-cache
...
[ I(21) : 0000 : 15:17:10 ] k_Replace: ViceFid (7f00002a.bc72.311e) with ViceFid (ffffffff.fffffffe.27b) in mini-cache

So we just 'invented' fake identifiers in the range from 0xea to 0x27b.
But the kernel is asking for fake objects that are clearly out of this
range. Which would indicate that the directory data that the kernel is
using is incorrect/outdated. Possibly caused by a process that is
blocking a re-open by having it's cwd in the offending location.

[ W(20) : 0000 : 15:17:35 ] fsdb::Get: Locally created fid (0xffffffff.0xfffffffe.0xe9) not found!

Jan

Coda File System

Re: okay, what am I doing wrong?