[mpich-discuss] mpdboot not working!

Madhurjya P Bora madhurjyap.bora at gmail.com
Fri May 30 21:28:25 CDT 2008


Hi All!

I am using the ROCKS (4.2.1) in mu Itanium cluster and  have
successfully installed mpich2 on all nodes.
All the test suggested by the install guide ran smoothly. I could
cross-communicate (as server and clients).
I can run mpd on all nodes independently with

mpd -h master port_no

but when I do a mpdboot, it fails with the message

-----------
mpdboot_plasma-gate.physics.guniv.ernet.in (handle_mpd_output 406):
from mpd on compute-0-0, invalid port info:
no_port
----------

My network configs are :

master (plasma-gate) : 10.1.1.1/255.255.255.0 (eth0)
                                : 10.10.136.70/255.255.255.0 (eth1)
compute nodes :10.1.1.254 onwards/255.255.255.0 (eth0)

The output of mpdcheck -f mpd.hosts -v on the master-node is :
------------------
obtaining hostname via gethostname and getfqdn
gethostname gives  plasma-gate.physics.guniv.ernet.in
getfqdn gives  plasma-gate.physics.guniv.ernet.in
checking out unqualified hostname; make sure is not "localhost", etc.
checking out qualified hostname; make sure is not "localhost", etc.
obtain IP addrs via qualified and unqualified hostnames;  make sure
other than 127.0.0.1
gethostbyname_ex:  ('plasma-gate.physics.guniv.ernet.in', [], ['10.10.136.70'])
gethostbyname_ex:  ('plasma-gate.physics.guniv.ernet.in', [], ['10.10.136.70'])
checking that IP addrs resolve to same host
now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
checking gethostbyXXX for unqualified compute-0-0
gethostbyname_ex:  ('compute-0-0.local', ['compute-0-0', 'c0-0'],
['10.1.1.252'])
checking gethostbyXXX for qualified compute-0-0
gethostbyname_ex:  ('compute-0-0.local', ['compute-0-0', 'c0-0'],
['10.1.1.252'])
---------------------------------
You can see that it is getting 10.10.136.70 from gethostbyname_ex which
I believe should be 10.1.1.1

Further the output of mpdcheck -f mpd.hosts -ssh on the master-node is :
----------------------------------
client on compute-0-0 failed to access the server
here is the output:
/usr/local/bin/mpdcheck.py: Command not found.
-----------------------------------

I doubt, something is worng with my network. As I, myself, have not
been able to figure it out, could anyone kindly help me out!

--- Madhurjya

------------------------------------------------------
Dr Madhurjya P Bora
Physics Department, Gauhati University
Guwahati 781 014, India.
------------------------------------------------------




More information about the mpich-discuss mailing list