Coda File System

cfs lv hangs on NetBSD/sparc, NetBSD/sparc64

From: Sean Caron <caron.sean_at_gmail.com>
Date: Wed, 26 Jul 2006 21:39:52 -0400
Hi all,

Running coda 6.0.14 on NetBSD/sparc64 back-end servers and a mix of
NetBSD/sparc, NetBSD/sparc64, and NetBSD/macppc clients (well
described in the list archives). I have recently taken note of the
fact that the 'cfs lv' command does not work on any of the SPARC
systems -- perhaps an endianness bug?

On my macppc system, I get the expected behavour,

# cfs lv /coda/diablonet.net/tmp
  Status of volume 7f00000b (2130706443) named "diablonet.tmp"
  Volume type is ReadWrite
  Connection State is Connected
  Reintegration age: 4294967295 sec, hogtime 4294967.295 sec
  Minimum quota is 0, maximum quota is unlimited
  Current blocks used are 49
  The partition has 51484224 blocks available out of 63091568

but on any of the SPARC (sun4m or sun4u) systems, I just get something like,

sonnet% cfs lv /coda/diablonet.net/tmp
(just sits forever)
^C/coda/diablonet.net/tmp: Interrupted system call
sonnet%

(this happens both against my own servers, and against
testserver.coda.cs.cmu.edu so I know its not anything wrong with my
server configuration)

After this point, venus is completely hosed -- if you go to CODA-space
and try to do anything like list a directory, read/write a file,
whatever, it is just hung up and will sit, until you kill it, always
with the interrupted system call error. You have to kill venus off and
re-invoke with venus -init to make it work again.

If I ktruss it, I see it is hanging up on the system call,

   803 cfs      open("/coda/.CONTROL", 0, 0)       = 3

and cranking venus up with debuglevel -d 100, I see this in the logs:

[ W(15) : 0000 : 21:21:03 ] fsobj::Lookup: (diablonet.net/tmp), uid = 0
[ W(15) : 0000 : 21:21:03 ] fsobj::Access : (diablonet.net, 8, 0), uid = 0
[ W(15) : 0000 : 21:21:03 ] Realm::GetUser local uid '0' for realm 'diablonet.ne
t'
[ W(15) : 0000 : 21:21:03 ] PutUser:
[ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f000000.84.10a), uid
 = 0, rights = 3, comp = tmp
[ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f000000, uid = -1
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 3
[ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f000000
[ W(15) : 0000 : 21:21:03 ] fsobj::fsobj: fid = (5043bac8.7f000000.84.10a), comp
 = tmp
[ W(15) : 0000 : 21:21:03 ] fsobj::ClearAcRights: (0.0.0.0), uid = -1
[ W(15) : 0000 : 21:21:03 ] fsobj::ClearRcRights: (5043bac8.7f000000.84.10a), ri
ghts = 0
[ W(15) : 0000 : 21:21:03 ] fsobj::DemoteAcRights: (5043bac8.7f000000.84.10a), u
id = -1
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 4
[ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: (tmp), uid = 0
[ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1
[ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1
[ W(15) : 0000 : 21:21:03 ] volent::FlushCOP2(Piggy): vol = 7f000000
[ W(15) : 0000 : 21:21:03 ] RepOpCommCtxt::AllReplicasSupportSHA(): 0x10dc30
[ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: ViceGetAttrPlusSHA(0x7f000000.84.10a
)
[ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1
[ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: start
[ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0
[ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: code = -2001, elapsed = 1.1
[ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001
                hosts = [0x45123a78 0x45123a70 0x45123a68 0x45123a60 0x45123a58
0x45123a50 0x45123a48 0x45123a40],
                retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002]
                0: [ 1 0 0 0 0 0 0 0 ] [ 50167af.0 ] [ 0 ]
[ W(15) : 0000 : 21:21:03 ] fsobj::SetRcRights: (5043bac8.7f000000.84.10a), righ
ts = 3
[ W(15) : 0000 : 21:21:03 ] fsobj::UpdateStatus: (5043bac8.7f000000.84.10a), uid
 = 0
[ W(15) : 0000 : 21:21:03 ] fsobj::Matriculate: (5043bac8.7f000000.84.10a)
[ W(15) : 0000 : 21:21:03 ] fsobj::AttachChild: (5043bac8.7f000000.1.1), (5043ba
c8.7f000000.84.10a)
[ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r
efcount = 2, detached = 0
[ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00
[ W(15) : 0000 : 21:21:03 ] fsobj::Reference: (5043bac8.7f000000.84.10a), old =
0, new = 5
[ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f000000.84.10a), refcnt = 1, r
eaders = 1, writers = 0, openers = 0
[ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f000000.84.10a), uid
 = 0, rights = 15, comp = tmp
[ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f000000, uid = -1
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 4
[ W(15) : 0000 : 21:21:03 ] The advice was to ReadDiscFetch --> Fetching.
[ W(15) : 0000 : 21:21:03 ] fsobj::Fetch: (tmp), uid = 0
[ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1
[ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1
[ W(15) : 0000 : 21:21:03 ] volent::FlushCOP2(Piggy): vol = 7f000000
[ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1
[ W(15) : 0000 : 21:21:03 ] (Multi)Fetch: start
[ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0
[ W(15) : 0000 : 21:21:03 ] (Multi)Fetch: code = -2001, elapsed = 1.3
[ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001
                hosts = [0x45125f10 0x45125f08 0x45125f00 0x45125ef8 0x45125ef0
0x45125ee8 0x45125ee0 0x45125ed8],
                retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002]
                0: [ 1 0 0 0 0 0 0 0 ] [ 50167af.0 ] [ 0 ]
[ W(15) : 0000 : 21:21:03 ] (Multi)ViceFetch: fetched 15 bytes
[ W(15) : 0000 : 21:21:03 ] fsobj::UpdateStatus: (5043bac8.7f000000.84.10a), uid
 = 0
[ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r
efcount = 2, detached = 0
[ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00
[ W(15) : 0000 : 21:21:03 ] fsobj::SetRcRights: (5043bac8.7f000000.84.10a), righ
ts = 15
[ W(15) : 0000 : 21:21:03 ] fsobj::Reference: (5043bac8.7f000000.84.10a), old =
5, new = 6
[ W(15) : 0000 : 21:21:03 ] fsobj::TryToCover: fid = (5043bac8.7f000000.84.10a)
[ W(15) : 0000 : 21:21:03 ] vdb::Get: volname = diablonet.tmp_at_diablonet.net
[ W(15) : 0000 : 21:21:03 ] GetAdmConn: diablonet.net
[ W(15) : 0000 : 21:21:03 ] GetServer: host = 64.7.160.150
[ W(15) : 0000 : 21:21:03 ] srvent::GetConn: host = harvest.diablonet.net, uid =
 -1, force = 0
[ W(15) : 0000 : 21:21:03 ] PutConn: host = harvest.diablonet.net, uid = -1, cid
 = 415605628, auth = 0
[ W(15) : 0000 : 21:21:03 ] PutServer: harvest.diablonet.net
[ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1
[ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: start
[ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0
[ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: code = 0, elapsed = 1.0
[ W(15) : 0000 : 21:21:03 ] connent::CheckResult: code = 0, vid = 0
[ W(15) : 0000 : 21:21:03 ] PutConn: host = harvest.diablonet.net, uid = -1, cid
 = 415605628, auth = 0
[ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.1000007
[ W(15) : 0000 : 21:21:03 ] vdb::Get: volname = 16777223_at_diablonet.net
[ W(15) : 0000 : 21:21:03 ] GetAdmConn: diablonet.net
[ W(15) : 0000 : 21:21:03 ] GetServer: host = 64.7.160.151
[ W(15) : 0000 : 21:21:03 ] srvent::GetConn: host = blossom.diablnet.net, uid =
-1, force = 0
[ W(15) : 0000 : 21:21:03 ] PutConn: host = blossom.diablnet.net, uid = -1, cid
= 391227550, auth = 0
[ W(15) : 0000 : 21:21:03 ] PutServer: blossom.diablnet.net
[ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1
[ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: start
[ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0
[ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: code = 0, elapsed = 1.0
[ W(15) : 0000 : 21:21:03 ] connent::CheckResult: code = 0, vid = 0
[ W(15) : 0000 : 21:21:03 ] PutConn: host = blossom.diablnet.net, uid = -1, cid
= 391227550, auth = 0
[ W(15) : 0000 : 21:21:03 ] volent::volent: (1000007, 16777223)
[ W(15) : 0000 : 21:21:03 ] volrep::volrep: host: 64.7.160.151 readonly: 0 paren
t: 0x7f00000b)
[ W(15) : 0000 : 21:21:03 ] GetServer: host = 64.7.160.151
[ W(15) : 0000 : 21:21:03 ] volent::volent: (7f00000b, diablonet.tmp)
[ W(15) : 0000 : 21:21:03 ] repvol::repvol 5043a4c8 00000000 00000000 00000000 0
0000000 00000000 00000000 00000000
[ W(15) : 0000 : 21:21:03 ] vsgdb::GetVSG 451276c8 451276c0 451276b8 451276b0 45
1276a8 451276a0 45127698 45127690
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (1000007, 16777223), refcnt = 2
[ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f00000b.1.1), uid =
0, rights = 3, comp = tmp
[ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f00000b, uid = -1
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 2
[ W(15) : 0000 : 21:21:03 ] vproc::Begin_VFS(Vget): vid = 5043bac8.7f00000b, u.u
_vol = 0, mode = -1
[ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f00000b
[ W(15) : 0000 : 21:21:03 ] volent::TakeTransition: diablonet.tmp, state = Hoard
ing, |AVSG| = 1, logv = 0
[ W(15) : 0000 : 21:21:03 ]     CML = 0, Res = 0
[ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f00000b.1.1), uid =
0, rights = 3, comp = tmp
[ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f00000b, uid = -1
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 3
[ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f00000b
[ W(15) : 0000 : 21:21:03 ] fsobj::fsobj: fid = (5043bac8.7f00000b.1.1), comp =
tmp
[ W(15) : 0000 : 21:21:03 ] fsobj::ClearAcRights: (0.0.0.0), uid = -1
[ W(15) : 0000 : 21:21:03 ] fsobj::ClearRcRights: (5043bac8.7f00000b.1.1), right
s = 0
[ W(15) : 0000 : 21:21:03 ] fsobj::DemoteAcRights: (5043bac8.7f00000b.1.1), uid
= -1
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 4
[ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: (tmp), uid = 0
[ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1
[ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1
[ W(15) : 0000 : 21:21:03 ] volent::FlushCOP2(Piggy): vol = 7f00000b
[ W(15) : 0000 : 21:21:03 ] RepOpCommCtxt::AllReplicasSupportSHA(): 0x10dc30
[ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: ViceGetAttrPlusSHA(0x7f00000b.1.1)
[ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1
[ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: start
[ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0
[ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: code = -2001, elapsed = 1.1
[ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001
                hosts = [0x451234e0 0x451234d8 0x451234d0 0x451234c8 0x451234c0
0x451234b8 0x451234b0 0x451234a8],
                retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002]
                0: [ 57 0 0 0 0 0 0 0 ] [ ebcd120.0 ] [ 0 ]
[ W(15) : 0000 : 21:21:03 ] fsobj::SetRcRights: (5043bac8.7f00000b.1.1), rights
= 3
[ W(15) : 0000 : 21:21:03 ] fsobj::UpdateStatus: (5043bac8.7f00000b.1.1), uid =
0
[ W(15) : 0000 : 21:21:03 ] fsobj::Matriculate: (5043bac8.7f00000b.1.1)
[ W(15) : 0000 : 21:21:03 ] Realm::GetUser local uid '0' for realm 'diablonet.ne
t'
[ W(15) : 0000 : 21:21:03 ] fsobj::SetAcRights: (5043bac8.7f00000b.1.1), uid = 0
, my_rights = 127, any_rights = 9
[ W(15) : 0000 : 21:21:03 ] PutUser:
[ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r
efcount = 2, detached = 0
[ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00
[ W(15) : 0000 : 21:21:03 ] fsobj::Reference: (5043bac8.7f00000b.1.1), old = 0,
new = 7
[ W(15) : 0000 : 21:21:03 ] vproc::End_VFS(Vget): code = 0
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 3
[ W(15) : 0000 : 21:21:03 ] fsobj::MountRoot: fid = 5043bac8.7f00000b.1.1, mtptf
id = 5043bac8.7f000000.84.10a
[ W(15) : 0000 : 21:21:03 ] k_Purge: fid = (5043bac8.7f00000b.1.1), severely = 0
[ W(15) : 0000 : 21:21:03 ] k_Purge: CODA_ZAPDIR, returns 0
[ W(15) : 0000 : 21:21:03 ] fsobj::CoverMtPt: fid = (5043bac8.7f000000.84.10a),
rootfid = (5043bac8.7f00000b.1.1)
[ W(15) : 0000 : 21:21:03 ] k_Purge: fid = (5043bac8.7f000000.84.10a), severely
= 1
[ W(15) : 0000 : 21:21:03 ] k_Purge: CODA_PURGEFID, returns 0
[ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f00000b.1.1), refcnt = 1, read
ers = 0, writers = 1, openers = 0
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 2
[ W(15) : 0000 : 21:21:03 ] fsobj::Lookup: crossing mount point
[ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f000000.84.10a), refcnt = 1, r
eaders = 1, writers = 0, openers = 0
[ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f000000.1.1), refcnt = 1, read
ers = 1, writers = 0, openers = 0
[ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f00000b.1.1), refcnt = 1, read
ers = 1, writers = 0, openers = 0
[ W(15) : 0000 : 21:21:03 ] vproc::End_VFS(Lookup): code = 0
[ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 3
[ W(15) : 0000 : 21:21:03 ] Lookup : returns SUCCESS, elapsed = 30.3 msec

[ X(00) : 0000 : 21:21:03 ] MarinerMux: mask = 0xffffc7e0

[ W(15) : 0000 : 21:21:03 ] CODA_IOCTL: u.u_pid = 20088 u.u_pgid = 20088
[ W(15) : 0000 : 21:21:03 ] vproc::ioctl(0): fid = 5043bac8.7f00000b.1.1, com =
Get VolStat
[ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f00000b
[ W(15) : 0000 : 21:21:03 ] volent::GetVolStat: vid = 7f00000b, uid = 0
[ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1
[ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1
[ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1
[ W(15) : 0000 : 21:21:03 ] (Multi)GetVolumeStatus: start
[ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0
[ W(15) : 0000 : 21:21:03 ] (Multi)GetVolumeStatus: code = -2001, elapsed = 1.2
[ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001
                hosts = [0x45120ed0 0x45120ec8 0x45120ec0 0x45120eb8 0x45120eb0
0x45120ea8 0x45120ea0 0x45120e98],
                retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002]
[ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r
efcount = 2, detached = 0
[ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00
[ W(15) : 0000 : 21:21:03 ] *****  FATAL SIGNAL (10) *****

and GDB output from the hung venus process,

0x0008b7f0 in SigChoke (sig=1158837200) at sighand.cc:241
241                 sigsuspend(&mask);
(gdb) list
236         {
237             int       living_dead = 1;
238             sigset_t  mask;
239             sigemptyset(&mask);
240             while (living_dead) {
241                 sigsuspend(&mask);
242             }
243         }
244     #endif
245
(gdb) bt
#0  0x0008b7f0 in SigChoke (sig=1158837200) at sighand.cc:241
#1  <signal handler called>
(gdb) up
#1  <signal handler called>
(gdb) list
241                 sigsuspend(&mask);
242             }
243         }
244     #endif
245
246         SigExit(sig);
247     }
248
249     static void SigExit(int sig)
250     {
(gdb) i frame
Stack level 1, frame at 0x451274d0:
 pc = 0x403c9250 in __sigtramp_sigcontext_1; saved pc 0x82d68
 caller of frame at 0x451273e8
 Arglist at 0x451274d0, args:
 Locals at 0x451274d0, Previous frame's sp in sp

Hopefully this all might be useful to someone in suggesting where the
fault might be? If anyone would like to see more data, please let me
know and I will provide it ASAP.

Thanks, Sean
scaron_at_umich.edu
Received on 2006-07-26 23:36:30