[mpich-discuss] Hydra

Reuti reuti at staff.uni-marburg.de
Wed Dec 15 11:40:27 CST 2010


Hi,

Am 15.12.2010 um 18:09 schrieb Mark Beauharnois:

> We just installed v1.3.1 of MPICH2 and are having a problem clustering two nodes together.  Both nodes (minserv1 and minserv2) are linked together via a private Gigabit Ethernet, and minserv1 has a local Ethernet connection we use to login to that system.
> 
> On ‘minserv1’ we execute:
> 
> [minserv1 ~] $mpdboot --totalnum=2 --ncpus=8 --ifhn=minserv1-gig -f ${HOME}/mpd.hosts

with Hydra, there is no mpdboot any longer. This was only necessary for the mpd startup method.


>  We then try to execute:
> [minserv1 ~]$ mpiexec -np 16 -machinefile ./mf hostname
>  
> 
> Where the machine file ‘mf’ contains:
> minserv1-gig:8
> minserv2-gig:8

What about:

minserv1-gig:8 ifhn=minserv1-gig
minserv2-gig:8

-- Reuti


> with the following results:
>  
> minserv1
> minserv1
> minserv1
> minserv1
> minserv1
> minserv1
> minserv1
> [proxy:0:1 at minserv2] HYDU_sock_connect (./utils/sock/sock.c:138): unable to get host address (Success)
> [proxy:0:1 at minserv2] main (./pm/pmiserv/pmip.c:208): unable to connect to server minserv1 at port 55501 (check for firewalls!)
> We& #8217;re that ‘minserv2’ appears to be trying to connect to ‘minserv1’ (the using the non-gigabit interface) when we’ve started mpd with the specification to explicitly use that interface?




More information about the mpich-discuss mailing list