Coda File System

Re: Need some enlightments

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 19 Mar 2001 14:56:16 -0500
On Mon, Mar 19, 2001 at 05:41:16PM +0100, Emmanuel Varagnat wrote:
> 
> I've got some questions about
>  - How did the Coda client choose the server to speak with,
>    and is there a sort of load balancing mechanism, or the (or
>    every) client is/are always talking the same server, as soon
>    is always reachable ?

In venus.conf there is a list of 'rootservers', these servers are
queried for what volume is the root-volume of a cell. These are also the
servers that are used for all volume replication and location
information requests.

Once a client knows on which server-group a volume is hosted, it
connects to all servers in the group. Effectively all requests are sent
to every replica of the volume, however, bulk data transfers (i.e.
file/directory fetch) are only sent back from the strongest host,
where strongest is the host for which the bandwidth estimate was the
highest at the end of the previous operation.

Coda's replication strategy is a modified write-all/read-any. Because we
still allow writes to succeed when not all servers in a replication
group are available, we need to validate the version vector with all
available servers to detect possible server-server conflicts. However,
once we know there is no conflict, the actual data can be pulled from
any of the available servers.

>  - I'm not to have well understand the way Coda share data.
>    Within a cell, every server share the same data. So there
>    is no way to extend the total amount of data available
>    by adding a new server ? The only way is to add a disk
>    or a partition in each server of the cell ?

No, within a Coda cell there are many "replicated volumes". These
volumes are replicated across a "volume storage group (VSG)". The VSG
is a group of servers as defined in the VSGDB during volume creation
which are all responsible for holding the data of a "volume replica".

Or in other words a replicated volume is a logical thing that actually
consists of one or more volume replicas which are stored on servers.
Which servers actually hold the volume replicas is defined by the volume
storage group at the time a volume is created. There are still some
places in the code where the VSG is accessed later on, so at the moment
it isn't too smart to modify an existing VSG definition once any volumes
are created for it.

So a cell could have 6 servers that are members of several VSGs,

E000100 server1 server2 server3 	# VSG for triply replicated volumes
E000101 server4 server5			# VSG for doubly replicated volumes
E000102 server6				# VSG for singly replicated volumes
E000200 server1 server4 server6		# another triply replicated VSG

Now any volumes created as "createvol_rep volume E000100" are stored on
server1, server2, server3, etc.

Adding a partition is no real problem, the VSG's stay the same, newly
created volumes just need to be given the correct partition identifier
(default is /vicepa, now it will possibly be /vicepb).

Adding a new server implies creating new VSGs that incorporate this
server.

E000300 server7				# VSG for just the new server
E000301 server6 server7			# VSG for volumes replicated
					# across servers #6 and #7

etc.

Jan
Received on 2001-03-19 14:59:28