[mpich-discuss] Hydra
Mark Beauharnois
mark at asrc.cestm.albany.edu
Wed Dec 15 11:09:22 CST 2010
Hi,
We just installed v1.3.1 of MPICH2 and are having a problem clustering two
nodes together. Both nodes (minserv1 and minserv2) are linked together via
a private Gigabit Ethernet, and minserv1 has a local Ethernet connection we
use to login to that system.
On 'minserv1' we execute:
[minserv1 ~] $mpdboot --totalnum=2 --ncpus=8 --ifhn=minserv1-gig -f
${HOME}/mpd.hosts
and 'mpdtrace -l' shows both servers communicating using their gigabit
interfaces.
We then try to execute:
[minserv1 ~]$ mpiexec -np 16 -machinefile ./mf hostname
Where the machine file 'mf' contains:
minserv1-gig:8
minserv2-gig:8
with the following results:
minserv1
minserv1
minserv1
minserv1
minserv1
minserv1
minserv1
minserv1
[proxy:0:1 at minserv2] HYDU_sock_connect (./utils/sock/sock.c:138): unable to
get host address (Success)
[proxy:0:1 at minserv2] main (./pm/pmiserv/pmip.c:208): unable to connect to
server minserv1 at port 55501 (check for firewalls!)
We're confused why it appears that 'minserv2' appears to be trying to
connect to 'minserv1' (the using the non-gigabit interface) when we've
started mpd with the specification to explicitly use that interface?
Help and guidance will be sincerely appreciated.
Thanks very much.
Mark Beauharnois
Atmospheric Sciences Research Center
State University of New York at Albany
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20101215/69b41bd0/attachment.htm>
More information about the mpich-discuss
mailing list