[MPICH] Selecting which ethernet interface to use
Jeff Squyres
jsquyres at cisco.com
Fri Aug 24 06:53:22 CDT 2007
On Aug 23, 2007, at 12:11 PM, William Gropp wrote:
> This is why there is a rank-specific interface name:
> MPICH_INTERFACE_HOSTNAME_Rnnn . So this isn't really scalable, but
> it does give you a route for now.
Good enough for the moment; thanks.
> Your note does point out a need for a more implicit specification,
> such as a regular pattern that could be used in a well-designed
> cluster.
Perhaps something as simple as MPICH_INTERFACE=ib0 (meaning: use
"ib0" for *all* MPI processes, perhaps unless overridden by the _R<x>
or _HOSTNAME variants).
Just my $0.02.
> Bill
>
> On Aug 23, 2007, at 7:07 AM, Jeff Squyres wrote:
>
>> On Aug 22, 2007, at 10:41 PM, Darius Buntinas wrote:
>>
>>> I don't know much about slurm, but looking at the docs, I bet it
>>> doesn't support that.
>>>
>>> But if you can set environment variables for each process in
>>> slurm, you can set the address other processes will use to
>>> connect to a process by setting the MPICH_INTERFACE_HOSTNAME
>>> environment variable for that process.
>>>
>>> E.g., if the address for a node is bb01 and the address for the
>>> ib interface is bb01-ib, set MPICH_INTERFACE_HOSTNAME=bb01-ib for
>>> any processes on that node.
>>
>> Hmm. Is there a way to *not* specify the hostname? SLURM can
>> distribute environment variables to the launched processes, but it
>> usually sends the same values to all processes. Eg:
>>
>> setenv FOO bar
>> srun -N 4 env | grep FOO
>>
>> results in
>>
>> FOO=bar
>> FOO=bar
>> FOO=bar
>> FOO=bar
>>
>> (where each of those came from a different node)
>>
>> So it would be pretty messy to setup a MPICH_INTERFACE_HOSTNAME
>> specific for each host. The desired ethernet interface is the
>> same on all of my hosts (ib0); is there a way to tell all MPICH2
>> processes to use ib0?
>>
>> FWIW: I tried setting MPICH_INTERFACE_HOSTNAME to "ib0" and that
>> didn't work; for the heckuvit I also tried setting MPICH_INTERFACE
>> to "ib0" and that didn't work either (on the long shot that
>> MPICH_INTERFACE was a host-unspecific variant of
>> MPICH_INTERFACE_HOSTNAME).
>>
>> Also, it seems a little odd that you specify "<hostname>-
>> <interface>" when the name of the variable is
>> MPICH_INTERFACE_HOSTNAME -- shouldn't it be
>> MPICH_HOSTNAME_INTERFACE to match the ordering? Just a nit. :-)
>>
>>> Another way, if you can use mpd, would be to create a machinefile
>>> that looks like:
>>
>> I don't really want to use mpd -- kinda the point of SLURM
>> exporting a PMI interface is to avoid using mpd and directly
>> launch MPI processes through the SLURM interface itself (i.e., I
>> don't have to write a script -- I can just srun my MPI processes
>> directly).
>>
>> Plus, if I used mpd, I'd have to glean the hosts that were
>> allocated to me from SLURM to create a hostfile, then make the
>> translations in that hostfile for what the corresponding public
>> ethernet interface name is in the ifhn clause, etc. It would be
>> much simpler if I could just setenv a variable that says what
>> ethernet interface to use on every host...
>>
>> Am I stuck? Do I need to go this route (use mpd/create a
>> hostfile) to use something other than eth0?
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
--
Jeff Squyres
Cisco Systems
More information about the mpich-discuss
mailing list