[MPICH] Using non-default ethernet interfaces

Ralph Butler rbutler at mtsu.edu
Thu May 17 09:32:53 CDT 2007


The reason your original failed was that you did not have the --ifhn  
option on the cmd-line and thus
one mpd ran with the slow interfaces and others ran with the fast  
interfaces.  You are right about the
principle of least surprise and the fact that perhaps mpdboot should  
be able to just figure things out.
The problem arises from the feature-creep that has plagued mpd and  
mpdboot.  We end up with a
plethora of options many of which not only don't fit well, but even  
directly conflict in some cases.
Further, sometimes folks like for it to be obvious that different IPs  
are really the same host, and yet
others like them to seem separate.  We re-implemented some of those  
user interfaces several times.
Maybe we finally arrived at a set that makes everyone unhappy.  :-)

On ThuMay 17, at Thu May 17 9:21AM, Jeff Squyres wrote:

> Thanks Ralph; that also worked (I used Anthony's solution before  
> reading yours).  Your solution allowed mpdboot to use my alternate  
> hostnames without barfing.
>
> I had originally tried having a hostfile like this:
>
> m1$ cat hostfile
> m1-fast
> m2-fast
> m1$ mpdboot -n 2 -f hostfile
>
> And got a weird error (see my prior e-mail).  Shouldn't mpdboot be  
> smart enough to be able to figure out that it is on the same host  
> as m1-fast, and not require the --ifhn argument on the command  
> line?  Based on the behavior of prior versions of MPICH (and LAM/ 
> MPI), I had expected that simply supplying the -fast hostnames in  
> the hostfile a) would allow mpdboot to work properly, and b) MPI  
> apps started via the mpd would "inherit" the -fast hostnames and  
> use them for MPI pt2pt communication.
>
> This is somewhat of a moot point since your method works; I mention  
> it simply as a usability / Law of Least Astonishment issue.
>
> (BTW, messages sent through this listserver seem to take quite a  
> while to be delivered; 30+ minutes sometimes...)
>
>
>
> On May 16, 2007, at 6:29 PM, Ralph Butler wrote:
>
>> This can be a bit tricky to get right no matter which way you try  
>> to do it, and there are
>> multiple ways to pull it off.  You can set the interface when mpds  
>> are started and then
>> the user pgms will use that same interface.  Or, you can let the  
>> mpds use one interface
>> and then have your program use a different interface at runtime;  
>> this requires using
>> the -machinefile option on mpiexec.
>>
>> I will assume for the present that you want to set the fast  
>> interface for the mpds and
>> then have the pgm inherit that interface.  From your email, I  
>> assume you are also
>> planning to start the mpds via mpdboot; that is actually the  
>> goofiest part.  Here's how
>> it goes.
>>
>> Let's assume that some machines are named m1, m2, and m3.  Let's  
>> also assume that
>> each host has a second name associated with a fast interface, e.g.  
>> m1-fast, m2-fast, and
>> m3-fast.  Finally, we will assume that you are going to run  
>> mpdboot from m1 and that you
>> want to use the fast interfaces.  Then you might create a hosts  
>> file named myhosts that
>> contains:
>>     m2-fast
>>     m3-fast
>> Since we are being picky about interfaces, I would leave m1/m1- 
>> fast out of the file and handle
>> it via the cmd-line.  It can get very confusing for mpdboot to  
>> know which IP to associate with
>> a given host.  Ultimately, it assumes that info about the local  
>> host is reliable from the cmd-line.
>> So, you do this:
>>     mpdboot -v -f myhosts -n 3 --ifhn=m1-fast
>> That --ifhn arg is the tricky part.  It is the ifhn to be  
>> associated with the local host.  It will NOT be
>> obtained from myhosts.  mpdboot will start one mpd on the local  
>> machine (m1 using the ifhn
>> m1-fast) and then will start 2 others on m2-fast and m3-fast.   
>> Now, mpich2 pgms should also
>> use those interfaces.
>>
>> --ralph
>>
>> On WedMay 16, at Wed May 16 5:34PM, Jeff Squyres wrote:
>>
>>> Greetings.  I'm trying to run MVAPICH2 over ethernet to do some  
>>> performance comparisons, but I'm having a heck of a time trying  
>>> to figure out how to use a non-default TCP interface.
>>>
>>> Specifically, eth0 is my "normal" gigE network (the IP address  
>>> associated with the hostname).  But I want to run an MVAPICH2 job  
>>> over ib0 -- my IPoIB interface.
>>>
>>> I looked through the user documentation and didn't see anything  
>>> about how to do this -- did I miss it?  Pointers would be greatly  
>>> appreciated.
>>>
>>> Thanks.
>>>
>>> -- 
>>> Jeff Squyres
>>> Cisco Systems
>>>
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>




More information about the mpich-discuss mailing list