Coda File System

Re: volutil rpc2 errors

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 15 May 2001 08:18:41 -0400
On Mon, May 14, 2001 at 10:18:31PM -0500, Ryan M. Lefever wrote:
> Hi,
> 
> I am trying to fix some RPC2 problems that I have when using volutil.
> 
> When I do a "volutil setdebug", the following happens no matter whether I
> do it locally or remotely, or to the SCM or a non-SCM. Also, a
> /vice/srv/CRASH file is created.
> 
> --
> [root_at_nsx srv]# startserver -d 1000
> [root_at_nsx srv]# volutil setdebug 100
> V_BindToServer: binding to host nsx.crhc.uiuc.edu
> VolSetDebug failed with RPC2_DEAD (F)
...
> I looked in the RPC2 manual.  An "RPC2_DEAD" completion code implies that
> "You were waiting for requests on a specific connection and that site has
> been deemed dead or unreachable."  This would seem to indicate that
> setdebug killed the server and then the RPC2 call reported it dead with an
> "RPC2_DEAD" completion code.

That is the correct analysis, the server died while processing the
volutil setdebug RPC2 call, and therefore couldn't send back an answer.

> --
> 
> The SrvErr file reads:
> 
> --
> could not open key 2 file: No such file or directory
> Assertion failed: 0, file "srv.cc", line 336
> EXITING! Bye!
> --

This is a generic assertion point where we always end up when a SIGSEGV
is received. If you create the file /vice/srv/ZOMBIFY, the server should
end up in an infinite loop at this point. Then you can easily attach gdb
and get a stacktrace.

    # gdb /usr/sbin/codasrv `pidof codasrv`
    (gdb) bt

The trace will be a bit funny, because the actual point where the
segfault was triggered won't show up. The stack is clobbered by the
signal handler. However, the function that called the function that
crashed will show up and from the line number it is possible to figure
out at least which function had a problem.

It will probably be something like,

    #1 coda_assert function where we are waiting
    #2 sigsegv handler
    #3 ???
    #4 function before the segv was received.
    x/x/volutil/vol_setdebug.cc:666


The other (and perhaps easier) way to debug this is by running codasrv
under the control of gdb at the time the segfault happens. That way the
stacktrace shows up a lot nicer.

    # gdb /usr/sbin/codasrv `pidof codasrv`
    (gdb) continue
    /* trigger the volutil setdebug crash */
    SEGV received
    (gdb) bt
    #1 culprit function
    file.cc:line

Jan
Received on 2001-05-15 08:18:44