[mpich-discuss] Hydra

Mark Beauharnois mark at asrc.cestm.albany.edu
Wed Dec 15 11:09:22 CST 2010


Hi,

 

We just installed v1.3.1 of MPICH2 and are having a problem clustering two
nodes together.  Both nodes (minserv1 and minserv2) are linked together via
a private Gigabit Ethernet, and minserv1 has a local Ethernet connection we
use to login to that system.

 

On 'minserv1' we execute:

 

[minserv1 ~] $mpdboot --totalnum=2 --ncpus=8 --ifhn=minserv1-gig -f
${HOME}/mpd.hosts

 

and 'mpdtrace -l' shows both servers communicating using their gigabit
interfaces.

 

We then try to execute:

 

[minserv1 ~]$ mpiexec -np 16 -machinefile ./mf hostname

 

Where the machine file 'mf' contains:

minserv1-gig:8

minserv2-gig:8

 

with the following results:

 

minserv1

minserv1

minserv1

minserv1

minserv1

minserv1

minserv1

minserv1

[proxy:0:1 at minserv2] HYDU_sock_connect (./utils/sock/sock.c:138): unable to
get host address (Success)

[proxy:0:1 at minserv2] main (./pm/pmiserv/pmip.c:208): unable to connect to
server minserv1 at port 55501 (check for firewalls!)

 

We're confused why it appears that 'minserv2' appears to be trying to
connect to 'minserv1' (the using the non-gigabit interface) when we've
started mpd with the specification to explicitly use that interface?

 

Help and guidance will be sincerely appreciated.

 

Thanks very much.

Mark Beauharnois

Atmospheric Sciences Research Center

State University of New York at Albany

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20101215/69b41bd0/attachment.htm>


More information about the mpich-discuss mailing list