Coda File System

From: Ivan Popov <pin_at_math.chalmers.se> Date: Tue, 15 Jan 2002 08:11:38 +0100 (MET)

Hello Jan!

> > It is no problem to set up a new non-scm machine, even create volumes on
> > it (shared with the old server, too), as long as I don't add an entry
> > for it in the "servers" file.
>
> How did you manage to create volumes on the non-scm machine if it isn't
> in the servers file? I can only see possible corruption coming resulting
> from that.

Well now when you say it and I can't test once more, I cannot argue.
May be I managed to create volumes only while having both entries in
"servers"?.. I realize I would need somewhere to pick the new
server number from - but the information was present in "servers" on the
non-scm machine (one and only line "<host> 2").

> As the VRDB and VLDB files are created by the SCM, this all depends
> greatly on whether the SCM managed to correctly resolve all servers in
> the servers file to ip-addresses, any failed resolves will result in a
> 0.0.0.0 address (I believe Shafeeq added a test to 5.3.17 to block a
> server from starting up when that happens).

Yes the test *is* there, and I have checked the name resolution
a-lot-more-than-double :)

Well, it may be my missing domainname that causes harm??
In this cell I have all of the machines in /etc/hosts, private ip numbers
and no domainnames - shouldn't really matter?

> When there are 2 servers with the same serverid in the "servers" file,
> both 'replicas' are pointing to the same machine after the volume was
> created (although both replicas are created on different machines).

Never had it that way.

> Similarily when the same server is listed twice with different id's some
> parts of the code might find the first serverid, while others find the

Neither that one.

> second serverid, leading to considerable confusion about volume
> location (I've seen both cases happen).

I have run before with two *identical* lines in "servers", probably some
damaged vice-setup rerun created it, but it is probably totally
irrelevant.

Anyway, the system works despite all experiments as usual, as long as
there is no mention of the second server in "servers".

> So right after installing the non-scm RPM, the steps would be something
> like,
> - Create the entry in the servers file, make sure that the id is unique,
>   and not any of 0, 127 or 255 (0, the prefix for replicated volumes, or
>   (unsigned char)-1).

The first one is 1, the second one is 2.

> - Create an entry in the VSGDB for both the SCM and the non-scm server.
>   Ofcourse with an unique VSG number (f.i. E0000101 SCM non-scm)

I created an entry for non-scm
and another one for both scm and non-scm, of course both unique.

Just to point out - I don't have to create replicated volumes to get in
trouble. In fact, I don't have to create *any* new volumes to get in
trouble. Just add a line in "servers" and I'm sitting there.

> - Possibly create a vicetab entry for the non-scm's /vicepa, etc. (if
>   you want to use a centralized vicetab), or make sure that vicetab is
>   _not_ listed in /vice/db/files to avoid updateclnt from "updating"
>   this file.

No, no, tried it, it breaks :-/ Another symptom of the same problem?

> - Restart the SCM codasrv.

Here the clients loose the capability to use the SCM server!
venus dies badly, after restart a parent murder occurs...

My first try was to setup the second machine and "servers" on scm and do
all the things, including creating volumes - but without restarting scm.
Then no crashes occurred, but the new volumes were not mountable.

(It means I *may* have had the "servers" updated at times when I created
the volumes)

> - Run the vice-setup script on the non-scm and go through the setup.
> - Run updateclnt -h `cat /vice/db/scm`
> - Check whether the files in /vice/db are in sync with those on the SCM.

Yes, they are.

> - Start the non-scm codasrv process.

> - Try to create a replicated volume.

Cannot tell for sure if it succeeds at this point, the odds are good that
it does, but it does not help :-)

> Technically the only difference between the SCM and non-SCM servers is
> that the SCM server is running the updatesrv and rpc2portmap processes
> and that all the other servers are using those to make local copies of
> the files in /vice/db. So adding users and creating volumes should be

It looks like file contents propagation works without any problems.

> Interesting. The first entry in the vicetab identifies for which server
> that line applies. So if you have,
>
> SCM 	/vicepa ftree width=64,depth=3
> non-SCM	/vicepa ftree width=64,depth=3
>
> and you see a complaint like that, both of your servers seem to be
> resolving to the same ip-address (again here we do the comparison by

They have of course different names and ip-adresses and I have no problems
using the network and running e.g. ssh even between them.

> > Two "unusual" things about my setup:

Well, it might be one more, kind of too evident thing... the servers run
different versions - scm 5.3.15, the new one 5.3.17.

But the problem arises as soon I modify "servers" on scm even if non-scm
is down. Puzzled. If I haven't read about the same problems for other
people, I would suspect I'm doing some really stupid mistakes...
May be I do? :-)

Coda is great, we have just to learn how to make it to work! :)

Best regards,
--
Ivan

Coda File System

Re: add servers to a Coda cell