Coda File System

Re: client died, now what?

From: Michael Stone <mstone_at_cs.loyola.edu>
Date: Mon, 26 Oct 1998 23:06:50 -0500
Quoting jaharkes_at_cs.cmu.edu (jaharkes_at_cs.cmu.edu):
> > I was writing a bunch of stuff out to the coda directory, and the client
> > (linux 2.1.123, included coda.o, coda 4.6.6) died.  Now, when I start
> > venus, I can see the files that were being copied into /coda, but none of
> > the stuff on the remote side. Eventually, I get something like this:
> 
> This is the first time that I've seen venus crash in the mariner (codacon)
> logging routines, and I'm not sure how reinitializing could help. How many 
> codacon's are running, did you exit a codacon process? It looks like the

Never ran it.

> used iterator could result in a bad pointer dereference when dead
> connections are destroyed.
> 
> It is even possible it can continue when you restart venus, since it looks
> like it wanted to print that some part of the reintegration went successful.
> (changes are reintegrated in `blobs' of 100, this makes it easier to pick up 
> from where it crashed the last time)

Well, I'm on a different client now, trying to untar the same file (maybe I
should give up :) and I see this on the screen (I saw it on the first client
when it died, too, but this time I'm copying it down):

tar: glibc-pre2.1-2.0.98.orig/sysdeps/unix/sysv/linux/sparc/sparc32/sys/ucontext.h:
Could not create file: Is a directory

Odd, it untars fine on a regular partition. I tail /usr/coda/console and see
this: 
22:39:50 Fatal Signal (11); pid 13597 becoming a zombie...
22:39:50 You may use gdb to attach to 13597

#0  0x400dd2a4 in __syscall_sigsuspend ()
#1  0x4010182c in svc_fdset ()
#2  0x80eafdc in FatalSignal (sig=11, code=0, contextPtr=0x0) at
sighand.cc:397
#3  0x80eac14 in SEGV (sig=11, code=0, contextPtr=0x0) at sighand.cc:230
#4  0x150fdd9c in ?? ()
#5  0x809453d in fsobj::DisconnectedCreate (this=0x2154798c, Mtime=909459590, 
    vuid=1000, t_fso_addr=0x150fdf24, name=0x828ed90 "if_tr.h", Mode=420, 
    target_pri=62500, Tid=-1) at fso_cfscalls0.cc:1860
#6  0x80951cb in fsobj::Create (this=0x2154798c, name=0x828ed90 "if_tr.h", 
    target_fso_addr=0x150fdf24, vuid=1000, Mode=420, target_pri=62500)
    at fso_cfscalls0.cc:1916
#7  0x8148002 in vproc::create (this=0x8290e08, dvp=0x835a598, 
    name=0x828ed90 "if_tr.h", vap=0x828ed2c, excl=0, mode=33188, 
    vpp=0x150fffa4) at vproc_vfscalls.cc:742
#8  0x814c01a in worker::main (this=0x8290e08, parm=0x813e670) at
worker.cc:939
#9  0x813e777 in VprocPreamble (init_lock=0x8290ee8) at vproc.cc:173
#10 0x817d345 in Create_Process_Part2 () at lwp.c:1107
#11 0x8180077 in L1 () at rec_dlist.cc:381
#12 0x805304ec in ?? ()
Cannot access memory at address 0x83e58955.

Looks like it stopped talking to the server (GetOperationState in
fsobj::Create shouldn't have been 0, right?), but the server's fine. (Or,
since I've nuked two clients now, maybe it's not...)

Mike Stone
Received on 1998-10-26 23:09:16