[MPICH] how to start up mpd conveniently

Galton, Simon galtons at aecl.ca
Tue Sep 26 08:38:03 CDT 2006


I worked out a process to allow nodes to join an existing ring in our Linux
cluster with dual-cpu cluster nodes:

#!/bin/sh
headnode=node1
echo -n $"Joining mpd ring hosted by $headnode: "
port=`ssh $headnode lsof | grep python2 | grep TCP | grep \* | cut -d: -f2 |
cut -d' ' -f1`
if [ "$port" = "" ]
then
	echo "$headnode is not running the ring, cannot join"
	return 1
else
	python2 /usr/local/mpich2/bin/mpd.py -h $headnode -p $port -d -e
--ncpus=2
fi 

-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Martin Kleinschmidt
Sent: September 26, 2006 6:09 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] how to start up mpd conveniently


Hi,

recently, I switched from mpich1 to mpich2 1.4p1.
so far, everything is runnung quite smoothly, but I'm unsure of how to setup
mpd correctly.
some facts first:
our cluster consists of 24 dual-processor nodes and one master server, (all
are running on fedora core 3) which are all connected by a 100 Mbit network
(192.168.101.X, hostnames master, node1, node2, ...) and the 24 nodes are
additionally connected via a 1Gbit network (192.168.103.X, hostnames gnode1,
gnode2, ...) Of course parallel communication should use the Gbit network.
My mpd.hosts:

gnode1:2 ifhn=gnode1
gnode2:2 ifhn=gnode2
gnode3:2 ifhn=gnode3
[...]

and I start mpd with

mpdboot --ifhn=gnode1 -n 24 --rsh=rsh

on one of the nodes, and do 

mpdtrace -l

the output is:

node1_60078 (192.168.103.2)
node2_50719 (192.168.103.3)
[...]

which is a little bit confusing, because it states the wrong hostname (nodeX
instead of gnodeX), and the right interface (192.168.103.X, not
192.168.101.X), but speed tests indicate that indeed, the Gbit interface is
in use.

The mpd on all machines is braught up by the root user executing mpdboot on
one of the nodes.

Now to my question:
if one of the nodes goes down/has to reboot for whatever reason - how do I
integrate it in the ring of still running pmds without affecting the mpds on
other nodes?
mpdallexit followed by mpdboot is not an option, because there will usually
be parallel applications still running on the other nodes.

thanks for reading my lengthy post

   ...martin

CONFIDENTIAL AND PRIVILEGED INFORMATION NOTICE

This e-mail, and any attachments, may contain information that
is confidential, subject to copyright, or exempt from disclosure.
Any unauthorized review, disclosure, retransmission, 
dissemination or other use of or reliance on this information 
may be unlawful and is strictly prohibited.  

AVIS D'INFORMATION CONFIDENTIELLE ET PRIVILÉGIÉE

Le présent courriel, et toute pièce jointe, peut contenir de 
l'information qui est confidentielle, régie par les droits 
d'auteur, ou interdite de divulgation. Tout examen, 
divulgation, retransmission, diffusion ou autres utilisations 
non autorisées de l'information ou dépendance non autorisée 
envers celle-ci peut être illégale et est strictement interdite.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20060926/be46090e/attachment.htm>


More information about the mpich-discuss mailing list