Coda File System

Re: Input/output error on individual files

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 11 Jul 2003 15:27:21 -0400
On Fri, Jul 11, 2003 at 11:15:33AM +0200, Steffen Neumann wrote:
> I have a couple of (non-vital) files that seem to have 
> propagated (partially) to the server, but are not 
> available for access or removal. There is no 
> conflict in that volume, the entries appear on all
> clients, venus -init has no effect. The rest of the directory 
> is fine.
> 
> 	aipc1(sneumann):laptop_liste>ls -la
> 	ls: mondrian: Input/output error
> 	ls: monet: Input/output error
> 	ls: miro: Input/output error
> 	ls: kandinsky: Input/output error
> 	ls: aipc4: Input/output error

Interesting, as this happens on all clients, my feeling is that these
are directory entries that point to non-existant vnodes.

> Question: where do they come from,

I'm not sure. If this is a singly replicated volume then we
can at least cancel out a resolution related bug. So it either occurred
when the client was fully connected, or during reintegration. 

Most likely case out of these two would be reintegration,

It might be related to a rename bug where the object we renamed got
removed instead of the one that we renamed over,

something like,
    touch foo ; touch bar
    mv bar foo
    # foo 'object' should be removed, and the name foo should point at
    # the bar 'object'

But that is a fairly common operation, maybe it is a corner case where
the rename has to occur before the filehandle is closed (store).

It could also be related to losing CML entries when stores are optimized.

    http://www.coda.cs.cmu.edu/rt2/Ticket/Display.html?id=690

i.e. we reintegrated first record in the following log,

    op#000  create X
    op#001  store X
    op#002  store mondrian
    op#003  store monet
    ...

But the client got disconnected and during the disconnected file X was
updated again, the store optimization removed op#001 and added op#007,

    ...
    op#006 store aipc4
    op#007 store X

Now we retry the reintegration and the server says 'hey I already
reintegrated everything up to op#001'. As a result, the client starts
cancelling CML entries until it finds op#001 which doesn't exist, and we
end up discarding the whole CML, losing the actual store operations for
these files.

But as far as I can see the vnodes should still exist, because those
were created at the time we added the names to the directory.

> 	  and how to get rid of 'em ?

This is about as complicated as the previous question. I'm not sure,
maybe the vnodes actually do exist, but are in some 'virgin state'. In
that case restarting the server should create empty container files.
It should be possible to check with volutil whether we have the vnodes,

>        /coda/vol/ai/share/laptop_liste/  FID = 0x7f000004.5.3           VV = [329959 0 0 0 0 0 0 0]    STOREID = 0x81468b62.3f0d06fb  FLAGS = 0x8

According to the above getfid output, this should dump the raw contents of
the laptop_liste directory,

    volutil showvnode 7f000004 3ae 3fddf

This output will contain entries like:

    thisblob: 16 next: 0, flag 1 fid: (42.22) playground

                                  vnode^  ^unique ^name

So now I can use that information to do a lookup for the actual vnode,

    $ volutil showvnode 7f000004 42 22
    42.22(1), symlink, cloned=0, mode=644, links=1, length=15
    inode=0x3, parent=1.1, serverTime=Mon Aug 24 10:07:54 1998
    author=7456, owner=7456, modifyTime=Mon Aug 24 10:07:41 1998
    , volumeindex = 0{[ 1 0 0 0 0 0 0 0 ] [ 41997777 903450466 ] [ 0 ]}


If the vnodes really do not exist, it should be possible to remove the
name entries that point to nothing with norton.

> I have a good few 
> 
> 	grep 0x7f000004 SrvLog
> 	09:20:22 ValidateVolumes: 0x7f000004 failed!
> 	09:46:41 PutReintegrateObjects: stale directory fid 0x7f000004.5.3, num 0, max 50
> 	10:01:38 PutReintegrateObjects: stale directory fid 0x7f000004.5.3, num 0, max 50

The stale directory fid stuff really shouldn't matter, this is just an
indication that the server should inform the client to refetch the
directory contents because the server just reintegrated some operation
and knows because of the directory version vector that this client's
view of the directory is outdated.

Jan
Received on 2003-07-11 15:30:37