Coda File System

Re: Heartbeat Failover

From: Tim Hasson <>
Date: Mon, 14 Jul 2003 13:00:29 -0700
Quoting Jan Harkes <>:

> On Sat, Jul 12, 2003 at 12:44:05PM -0700, Tim Hasson wrote:
> > What if you had a 2nd 100mbps NIC in each of the SCM and the NON-SCM, 
> > dedicated to CODA rep, with their own private network (say, and
> > linked by a crossover cable and /etc/hosts would look like
> > in both machines:
> >      SCM
> >      NonSCM
> > and you setup Coda on both machines using those priv hostnames.
> Then when a client asks for the location of a volume, it will get the
> answer 'oh, it's replicated at and', which is
> pretty useless for clients that are not on your private backbone
> network.

Exactly! Which is because I am paranoid about security. I don't want clients, 
but rather server/client <---> server/client type of setup. Each of the SCM 
and non-SCM runs a qmail/vpopmail/courier toaster, apache(w/ mod's), pureftpd, 
and venus client. All data that I replicated would be symbolically linked to a 
mount point or directory in /coda. For example: /www/vhost_files points 
to /coda/www.vhosts
I read about tokentool for long-expiring tokens, but I think a cronjob 
renewing tokens for each daemon every 2 to 4 hours should be more secure than 
longer tokens, correct me if I am wrong.
I noticed that the hostname of the machine has to be set as the hostname 
resolving to the private ip in /etc/hosts (which I gave to vice-setup scripts) 
or codasrv will complain. This doesn't matter to me much since I could just 
add the real hostname -> real ip in /etc/hosts also.

> > 1. LVS-DR (Direct Routing) for load balancing and failover to the real
> > servers (the coda servers) using 1 or more LVS Directors (Linux only
> > for now)
> Bad idea, in contrast with stateless NFS servers, Coda servers do keep
> state around. Any (authenticated) connection is bound to a specific
> server. When you redirect the packets to another Coda server, it will
> respond with a NAK and the client will have to reconnect and revalidate
> it's local cache with the new server.

I am sorry if I wasn't clear on this one. I am load balancing the services 
such as http,ftp,smtp, etc. I am not talking to coda directly, but indirectly 
through the daemons running on the same machine as the coda servers that have 
an authenticated session and privilges to r/w to particular volumes.

> > 2. IP-Takover, if you wanted 1 coda server to take over the other's ip
> > if goes down by sharing a VirtualIP (LVS stuff) on the lo interface.
> > The machine taking over an ip will watch if the other machine goes
> > back up and change its IP back to the original.
> No need, a client always sends all requests to all available replicas in
> parallel, a dead server simply causes a 15 second 'stall' while we make
> sure the request wasn't simply dropped by the network. At that point,
> the server is marked dead and we perform an occasional background probe
> to check whether the server came back.

That's awesome. This means I can just watch my http/ftp/etc.. services with 
mon or keepalived from the LVS-Director and if they die it will direct traffic 
to the other alive one only.
But I understood from what you said that I cannot edit the userdb if the scm 
crashed untill i reconstruct a new scm or change one of the replicas as scm?
I was thinking of the possibilities of automation of creating users using 
pdbtool and expect script, and cfs mkm volume per user.
>From some readings on the mailinglist I understood that it's better to have 
more smaller volumes than one big volume. What is the 16M files limit? Is that 
per volume or Coda wide? What does this has to do with the architecture?

> > I was able to finally create a 1GB RVM partition (and 30M Log) on both 
> > machines (thanks to Jan for notes on how to fit it), using the following 
> > options with rdsinit:
> ...
> > On a P3 800Mhz, 256M RAM, 1.4GB SWAP, the fileserver startup time was about
> > 2min - 2.5mins, which I think isn't very bad. Is that Ok?
> How much is the system pushed into swap at that point? If private mmap
> is enabled, the server should be mostly paging parts of RVM, only dirty
> page are pushed to swap. If you have close to a GB of swap usage this
> was a 'slow' startup :) 2 minutes sounds short for a server that has to
> push a Gig to swap, but FreeBSD's MM system is pretty fast so it is
> possible.

server.conf was automatically generated I guess by the vice-setup-rvm script 
which I attempted to use first (priv mmap is set to "1" which i guess means 
enabled). Using the "1G" type at the data RVM size prompt didn't work or 
rdsinit said unmap failed, although the vice script succeeded.
So i just recreated the same devices manually using rdsinit.
with codasrv and venus both running on:

CPU states:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% idle
Mem: 173M Active, 26M Inact, 34M Wired, 13M Cache, 35M Buf, 1496K Free
Swap: 1400M Total, 896M Used, 503M Free, 64% Inuse

CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 53M Active, 22M Inact, 27M Wired, 7604K Cache, 22M Buf, 11M Free
Swap: 1500M Total, 1000M Used, 500M Free, 66% Inuse

the starting time on the non-scm was actually like 3 minutes. Maybe cause it 
only had 128MB physical ram at the time i tested it. I am in the process of 
upgrading both machines to 1GB RAM. Would that help Coda much?

> >   starting address of rvm: 0x70000000 (1879048192)
> Why did you move the start address up from 0x50000000? Was there a
> conflict with shared libraries or something?

I dont recall getting errors because of the starting address, but I think in 
the vice-setup-rvm it said something about 5000 and 7000 both used at CMU for 
starting addresses on BSD systems, so I was just being random.

> Jan
Received on 2003-07-14 16:12:03