[MPICH] how to start up mpd conveniently

Martin Kleinschmidt mk at theochem.uni-duesseldorf.de
Tue Sep 26 05:08:57 CDT 2006


Hi,

recently, I switched from mpich1 to mpich2 1.4p1.
so far, everything is runnung quite smoothly, but I'm unsure of how to
setup mpd correctly.
some facts first:
our cluster consists of 24 dual-processor nodes and one master server,
(all are running on fedora core 3) which are all connected by a 100 Mbit
network (192.168.101.X, hostnames master, node1, node2, ...) and the 24
nodes are additionally connected via a 1Gbit network (192.168.103.X,
hostnames gnode1, gnode2, ...)
Of course parallel communication should use the Gbit network.
My mpd.hosts:

gnode1:2 ifhn=gnode1
gnode2:2 ifhn=gnode2
gnode3:2 ifhn=gnode3
[...]

and I start mpd with

mpdboot --ifhn=gnode1 -n 24 --rsh=rsh

on one of the nodes, and do 

mpdtrace -l

the output is:

node1_60078 (192.168.103.2)
node2_50719 (192.168.103.3)
[...]

which is a little bit confusing, because it states the wrong hostname
(nodeX instead of gnodeX), and the right interface (192.168.103.X, not
192.168.101.X), but speed tests indicate that indeed, the Gbit interface
is in use.

The mpd on all machines is braught up by the root user executing mpdboot
on one of the nodes.

Now to my question:
if one of the nodes goes down/has to reboot for whatever reason - how do
I integrate it in the ring of still running pmds without affecting the
mpds on other nodes?
mpdallexit followed by mpdboot is not an option, because there will
usually be parallel applications still running on the other nodes.

thanks for reading my lengthy post

   ...martin




More information about the mpich-discuss mailing list