Coda File System

crash recovery

From: Steffen Neumann <sneumann_at_TechFak.Uni-Bielefeld.DE>
Date: 17 Oct 2002 15:34:15 +0200
Hi, 

It has happened, out hard drive hosting the coda data
had tiny smoke coming from it ...

We were able to recover most of the /vicep[abc] partitions to a new
hard drive, as well as the rvm partition. Some files on /vicepa 
had read errors and thus are missing on the new partition.

Main point is now: how do we ensure that this recovery worked ?

The old filesystem had errors on a number of files:

	 ./1/5b/18: Input/output error
	 ./1/5b/19: Input/output error
	[...]
	 ./1/5b/37: Input/output error

How do we find those files we have to get from the 96 hour old backup ? 
We'd like to keep those to a minimum, of course...

The server started fine, with the following kind of entries in SrvLog:

----
        13:47:05 Entering DCC(0x1000008)
        13:47:05 DCC: Salvaging Logs for volume 0x1000008
        
        13:47:05 done:  2967 files/dirs,        120930 blocks
        13:47:05 SFS:No Inode summary for volume 0x1000009; skipping full salvage
        13:47:05 SalvageFileSys: Therefore only resetting inUse flag
----
        13:47:05 Entering DCC(0x100000a)
        13:47:05 DCC: Salvaging Logs for volume 0x100000a
        
	13:47:05 done:  823 files/dirs, 108199 blocks
----
	13:48:03 SalvageFileSys:  unclaimed volume header file or no Inodes in volume 10
	13:48:03 SalvageFileSys: Therefore only resetting inUse flag
	13:48:03 SalvageFileSys:  unclaimed volume header file or no Inodes in volume 10
	13:48:03 SalvageFileSys: Therefore only resetting inUse flag
	13:48:03 SalvageFileSys completed on /vicepa

Are both volumes O.K. ?

It is followed by a number of transactions which looked kinda o.k.:

	13:48:27 recov_vol_log::SalvageLog: bitmaps are not equal

	13:48:27 Log rec at index 84 is unreachable

	    **Server: 0x81468b62 StoreId: 0x3016fa6.29ac 
	    Directory(0x53.bd28)
	    Opcode: Mkdir 
	    index is 84, sequence number 89703, var length is 17
	    . [0x53.bd28] owner 10157
	    ** End of Record **

        13:48:27 Log rec at index 85 is unreachable
        
            **Server: 0x81468b62 StoreId: 0x3016fa6.29ad 
            Directory(0x53.bd28)
            Opcode: Create 
            index is 85, sequence number 89704, var length is 26
            lsR999.tmp [0x1182.befa] owner 0
            ** End of Record **
        
        13:48:27 Log rec at index 87 is unreachable

            **Server: 0x81468b62 StoreId: 0x3016fa6.29b2 
            Directory(0x53.bd28)
            Opcode: Rename 
            index is 87, sequence number 89706, var length is 140
            (src) other dir (0x77.25c) lsR999.tmp (0x1182.befa)[0 0 0 0 0 0 0 0 0x3016fa
         renamed to ls-R
            ** End of Record **


Later there are quite a few of 

	13:49:37 client_GetVenusId: got new host 129.70.139.110:2430
	13:49:37 Building callback conn.
	13:49:37 No idle WriteBack conns, building new one
	13:49:37 Writeback message to 129.70.139.110 port 2430 on conn 3316c35 succeeded
	13:49:40 RevokeWBPermit on conn 3316c35 returned 0
	13:49:40 ValidateVolumes: 0x7f000001 failed!
	[...]
	13:49:40 ValidateVolumes: 0x7f000059 failed!



Testing the filesystem with a 

	find /coda 

gave us 

	14:15:10 TryToCover: bogus link length

What does that mean ?

I appreciate any answer,
yours, 
Steffen






Received on 2002-10-17 09:39:35