[MPICH] Using non-default ethernet interfaces

Jeff Squyres jsquyres at cisco.com
Thu May 17 09:09:03 CDT 2007


That worked like a champ.  Thanks!

This is now a moot point, but I'm curious: any idea why mpdboot  
failed with the hostfile containing my alternate IP addresses?



On May 16, 2007, at 5:14 PM, Anthony Chan wrote:

>
> AFAIK, the hostfile is for mpd only, so hostfile could contain  
> "standard"
> machine name like svbu-mpi001 and svbu-mpi001 (which corresponds to
> ethernet hostname), then do mpdboot with this hostfile.
>
> Now create a machinefile, e.g. machinefile.txt, that maps the ethernet
> hostname to IPoIB hostname
>
>> cat machinefile.txt
> svbu-mpi001 ifhn=svbu-mpi001-ib
> svbu-mpi002 ifhn=svbu-mpi002-ib
> ...
>
> Now launch your MPI job as
>
>> mpiexec -machinefile machinefile.txt -n 2 <your_benchmark_program>
>
> Hope this helps.
>
> A.Chan
>
> On Wed, 16 May 2007, Jeff Squyres wrote:
>
>> I forgot to mention that I tried the usual trick of giving a hostfile
>> with hostnames that correspond to the IPoIB IP addresses.
>> Unfortunately, mpdboot fails on it for some reason, even though all
>> the interfaces are up and I am able to ping them:
>>
>> -----
>> [16:29] svbu-mpi001:~ % cat h
>> svbu-mpi001-ib0
>> svbu-mpi002-ib0
>> [16:29] svbu-mpi001:~ % ping svbu-mpi001-ib0
>> PING svbu-mpi001-ib.cisco.com (192.168.0.1) 56(84) bytes of data.
>> 64 bytes from svbu-mpi001-ib.cisco.com (192.168.0.1): icmp_seq=0
>> ttl=64 time=0.049 ms
>> 64 bytes from svbu-mpi001-ib.cisco.com (192.168.0.1): icmp_seq=1
>> ttl=64 time=0.022 ms
>>
>> --- svbu-mpi001-ib.cisco.com ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
>> rtt min/avg/max/mdev = 0.022/0.035/0.049/0.014 ms, pipe 2
>> [16:29] svbu-mpi001:~ % ping svbu-mpi002-ib0
>> PING svbu-mpi002-ib.cisco.com (192.168.0.2) 56(84) bytes of data.
>> 64 bytes from svbu-mpi002-ib.cisco.com (192.168.0.2): icmp_seq=0
>> ttl=64 time=1.79 ms
>> 64 bytes from svbu-mpi002-ib.cisco.com (192.168.0.2): icmp_seq=1
>> ttl=64 time=0.107 ms
>>
>> --- svbu-mpi002-ib.cisco.com ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
>> rtt min/avg/max/mdev = 0.107/0.950/1.793/0.843 ms, pipe 2
>> [16:29] svbu-mpi001:~ % mpdboot -n 2 -f h
>> mpdboot_svbu-mpi001.cisco.com (handle_mpd_output 374): failed to ping
>> mpd on svbu-mpi001-ib0; recvd output={}
>>
>> [16:29] svbu-mpi001:~ %
>> -----
>>
>> Did I do something wrong?
>>
>> Many thanks.
>>
>>
>>
>> On May 16, 2007, at 3:34 PM, Jeff Squyres wrote:
>>
>>> Greetings.  I'm trying to run MVAPICH2 over ethernet to do some
>>> performance comparisons, but I'm having a heck of a time trying to
>>> figure out how to use a non-default TCP interface.
>>>
>>> Specifically, eth0 is my "normal" gigE network (the IP address
>>> associated with the hostname).  But I want to run an MVAPICH2 job
>>> over ib0 -- my IPoIB interface.
>>>
>>> I looked through the user documentation and didn't see anything
>>> about how to do this -- did I miss it?  Pointers would be greatly
>>> appreciated.
>>>
>>> Thanks.
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>


-- 
Jeff Squyres
Cisco Systems




More information about the mpich-discuss mailing list