Coda File System

Re: codasrv: open_log failed RVM_EIO

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 27 May 2003 11:40:02 -0400
On Tue, May 27, 2003 at 02:11:54PM +0100, Malte Grosche wrote:
> Hello,
> 
> after having run coda pretty well for several years, I suddenly started 
> getting this message on starting up the coda server: (SrvLog file)
... 
> log_recover failed.
> do_rvm_options failed
> 13:47:27 rvm_init failed RVM_EIO

Normally, seeing RVM_EIO could be one out of a thousand things. I'm
seeing it on machines that really do not have a lot of memory and when
a server is restarted quickly after it died they crash with EIO in
log_recover. Simply waiting for 30 seconds before restarting a server
totally fixes things for me.

However, thanks to your strace we might be able to dig a bit deeper.
I'm guessing that we're opening the RVM log here.

> open("/dev/hda7", O_RDWR)               = 3
...
> lseek(3, 2615296, SEEK_SET)             = 2615296
> read(3, "\0\0\0\0\0ile server was shut down\0\0\0"..., 512) = 512

Interestingly, part of a SrvLog message ended up in the RVM log. And
this could be a reason that RVM gets pretty confused and fails during
log recovery.

> lseek(3, 2614784, SEEK_SET)             = 2614784
> read(3, "\0\0\0\0\35\0\0\0\350\2\0\0\\{\317>6_\t\0\366[\10\0\34"..., 262144) = 262144
> lseek(3, 2615296, SEEK_SET)             = 2615296
> read(3, "\0\0\0\0\0ile server was shut down\0\0\0"..., 262144) = 262144
> lseek(3, 8192, SEEK_SET)                = 8192
> write(3, "\30\0\0\0\177\223Y\225RVM Interface Version 1."..., 1536) = 1536
> fdatasync(3)                            = 0
...
> open(0, O_RDWR)                         = -1 EFAULT (Bad address)

This open() is very curious, would this be RVM's attempt to open the
data file? Or is this something that goes wrong when we're trying to
clean up state in venus. I'm assuming this is still RVM, trying to get
at the data partition.

Some information I'd like to see is the current log status,

    # rvmutl
    * o /dev/hda7
    * status
    ...
    * quit

Some of the interesting information would be the following, if we're
lucky the log is empty or very recently truncated. And most of the time
with our servers, there is hardly anything in the log. We tend to
truncate the RVM log as soon as possible.

  status last written:   Tue May 27 2003  5:56:26.828922
  last truncation:       Tue May 27 2003  5:50:50.121629
  log head offset:         13666704
  log tail offset:         13929264
  log empty:             false

(There is a manual page for rvmutl,
    http://www.coda.cs.cmu.edu/doc/install/html/rvm_manual-8.html)

Before completely destroying our existing state it is probably wise to
back up the current log and data. If they are files, a simple copy will
work, otherwise 'dd if=/dev/hda7 of=rvmlog bs=1024 count=<logsize in KB>'.

Next step is to try to force a log truncation, which should apply any
pending updates to RVM data,

    # rvmutl
    * o /dev/hda7
    * recover
    * status
    * quit

If that worked without anything crashing, great. If it didn't, well
hopefully the logged but not applied updates weren't all that critical
after all because now we are going to wipe the existing log and lose all
pending changes,

    # rvmutl
    * i /dev/hda7 20M
    * quit

At this point the log is guaranteed to be empty and the server will be
able to load the existing RVM data without crashing. During startup the
server will try to salvage the volumes and if RVM worked the way it was
intended, most of the internal structures should still be consistent and
we're back in business.

Jan
Received on 2003-05-27 11:43:30