[mpich-discuss] Hydra
Reuti
reuti at staff.uni-marburg.de
Wed Dec 15 11:40:27 CST 2010
Hi,
Am 15.12.2010 um 18:09 schrieb Mark Beauharnois:
> We just installed v1.3.1 of MPICH2 and are having a problem clustering two nodes together. Both nodes (minserv1 and minserv2) are linked together via a private Gigabit Ethernet, and minserv1 has a local Ethernet connection we use to login to that system.
>
> On ‘minserv1’ we execute:
>
> [minserv1 ~] $mpdboot --totalnum=2 --ncpus=8 --ifhn=minserv1-gig -f ${HOME}/mpd.hosts
with Hydra, there is no mpdboot any longer. This was only necessary for the mpd startup method.
> We then try to execute:
> [minserv1 ~]$ mpiexec -np 16 -machinefile ./mf hostname
>
>
> Where the machine file ‘mf’ contains:
> minserv1-gig:8
> minserv2-gig:8
What about:
minserv1-gig:8 ifhn=minserv1-gig
minserv2-gig:8
-- Reuti
> with the following results:
>
> minserv1
> minserv1
> minserv1
> minserv1
> minserv1
> minserv1
> minserv1
> [proxy:0:1 at minserv2] HYDU_sock_connect (./utils/sock/sock.c:138): unable to get host address (Success)
> [proxy:0:1 at minserv2] main (./pm/pmiserv/pmip.c:208): unable to connect to server minserv1 at port 55501 (check for firewalls!)
> We& #8217;re that ‘minserv2’ appears to be trying to connect to ‘minserv1’ (the using the non-gigabit interface) when we’ve started mpd with the specification to explicitly use that interface?
More information about the mpich-discuss
mailing list