[MPICH] how to start up mpd conveniently
Galton, Simon
galtons at aecl.ca
Tue Sep 26 08:38:03 CDT 2006
I worked out a process to allow nodes to join an existing ring in our Linux
cluster with dual-cpu cluster nodes:
#!/bin/sh
headnode=node1
echo -n $"Joining mpd ring hosted by $headnode: "
port=`ssh $headnode lsof | grep python2 | grep TCP | grep \* | cut -d: -f2 |
cut -d' ' -f1`
if [ "$port" = "" ]
then
echo "$headnode is not running the ring, cannot join"
return 1
else
python2 /usr/local/mpich2/bin/mpd.py -h $headnode -p $port -d -e
--ncpus=2
fi
-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Martin Kleinschmidt
Sent: September 26, 2006 6:09 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] how to start up mpd conveniently
Hi,
recently, I switched from mpich1 to mpich2 1.4p1.
so far, everything is runnung quite smoothly, but I'm unsure of how to setup
mpd correctly.
some facts first:
our cluster consists of 24 dual-processor nodes and one master server, (all
are running on fedora core 3) which are all connected by a 100 Mbit network
(192.168.101.X, hostnames master, node1, node2, ...) and the 24 nodes are
additionally connected via a 1Gbit network (192.168.103.X, hostnames gnode1,
gnode2, ...) Of course parallel communication should use the Gbit network.
My mpd.hosts:
gnode1:2 ifhn=gnode1
gnode2:2 ifhn=gnode2
gnode3:2 ifhn=gnode3
[...]
and I start mpd with
mpdboot --ifhn=gnode1 -n 24 --rsh=rsh
on one of the nodes, and do
mpdtrace -l
the output is:
node1_60078 (192.168.103.2)
node2_50719 (192.168.103.3)
[...]
which is a little bit confusing, because it states the wrong hostname (nodeX
instead of gnodeX), and the right interface (192.168.103.X, not
192.168.101.X), but speed tests indicate that indeed, the Gbit interface is
in use.
The mpd on all machines is braught up by the root user executing mpdboot on
one of the nodes.
Now to my question:
if one of the nodes goes down/has to reboot for whatever reason - how do I
integrate it in the ring of still running pmds without affecting the mpds on
other nodes?
mpdallexit followed by mpdboot is not an option, because there will usually
be parallel applications still running on the other nodes.
thanks for reading my lengthy post
...martin
CONFIDENTIAL AND PRIVILEGED INFORMATION NOTICE
This e-mail, and any attachments, may contain information that
is confidential, subject to copyright, or exempt from disclosure.
Any unauthorized review, disclosure, retransmission,
dissemination or other use of or reliance on this information
may be unlawful and is strictly prohibited.
AVIS D'INFORMATION CONFIDENTIELLE ET PRIVILÉGIÉE
Le présent courriel, et toute pièce jointe, peut contenir de
l'information qui est confidentielle, régie par les droits
d'auteur, ou interdite de divulgation. Tout examen,
divulgation, retransmission, diffusion ou autres utilisations
non autorisées de l'information ou dépendance non autorisée
envers celle-ci peut être illégale et est strictement interdite.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20060926/be46090e/attachment.htm>
More information about the mpich-discuss
mailing list