[mpich-discuss] mpdboot not working!
Madhurjya P Bora
madhurjyap.bora at gmail.com
Fri May 30 21:28:25 CDT 2008
Hi All!
I am using the ROCKS (4.2.1) in mu Itanium cluster and have
successfully installed mpich2 on all nodes.
All the test suggested by the install guide ran smoothly. I could
cross-communicate (as server and clients).
I can run mpd on all nodes independently with
mpd -h master port_no
but when I do a mpdboot, it fails with the message
-----------
mpdboot_plasma-gate.physics.guniv.ernet.in (handle_mpd_output 406):
from mpd on compute-0-0, invalid port info:
no_port
----------
My network configs are :
master (plasma-gate) : 10.1.1.1/255.255.255.0 (eth0)
: 10.10.136.70/255.255.255.0 (eth1)
compute nodes :10.1.1.254 onwards/255.255.255.0 (eth0)
The output of mpdcheck -f mpd.hosts -v on the master-node is :
------------------
obtaining hostname via gethostname and getfqdn
gethostname gives plasma-gate.physics.guniv.ernet.in
getfqdn gives plasma-gate.physics.guniv.ernet.in
checking out unqualified hostname; make sure is not "localhost", etc.
checking out qualified hostname; make sure is not "localhost", etc.
obtain IP addrs via qualified and unqualified hostnames; make sure
other than 127.0.0.1
gethostbyname_ex: ('plasma-gate.physics.guniv.ernet.in', [], ['10.10.136.70'])
gethostbyname_ex: ('plasma-gate.physics.guniv.ernet.in', [], ['10.10.136.70'])
checking that IP addrs resolve to same host
now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
checking gethostbyXXX for unqualified compute-0-0
gethostbyname_ex: ('compute-0-0.local', ['compute-0-0', 'c0-0'],
['10.1.1.252'])
checking gethostbyXXX for qualified compute-0-0
gethostbyname_ex: ('compute-0-0.local', ['compute-0-0', 'c0-0'],
['10.1.1.252'])
---------------------------------
You can see that it is getting 10.10.136.70 from gethostbyname_ex which
I believe should be 10.1.1.1
Further the output of mpdcheck -f mpd.hosts -ssh on the master-node is :
----------------------------------
client on compute-0-0 failed to access the server
here is the output:
/usr/local/bin/mpdcheck.py: Command not found.
-----------------------------------
I doubt, something is worng with my network. As I, myself, have not
been able to figure it out, could anyone kindly help me out!
--- Madhurjya
------------------------------------------------------
Dr Madhurjya P Bora
Physics Department, Gauhati University
Guwahati 781 014, India.
------------------------------------------------------
More information about the mpich-discuss
mailing list