[mpich2-dev] mpiexec to call launcher for all names in machines file without hostname resolution (was -nolocal option)

Pavan Balaji balaji at mcs.anl.gov
Fri Jul 8 21:31:28 CDT 2011


Hi John,

Can you try this patch: http://pastebin.com/09iC9PdD

There were two reasons for the local host check: (1) to figure out which 
cases we can avoid doing an ssh and instead just use a fork, and (2) 
workaround for cases where the default hostname of a node is not 
accessible over the network (in this case we try to find the "local 
hostname" in the host list passed by the user).

While the first one is only a performance optimization, the second is a 
correctness issue. So we can't just disable the local test the way you 
did without breaking Hydra on some platforms. Instead, I have tried to 
handle this issue by not throwing a failure when a hostname doesn't 
resolve, and instead just assuming that it's not the local host.

This is not a perfect solution as this will not work in cases where the 
second point above is true && the user wants to use aliases instead of 
regular host names. But it might be OK for cases where one or both of 
the above two conditions is false.

  -- Pavan

On 07/08/2011 05:03 PM, John Marshall wrote:
> On 07/08/2011 05:35 PM, Dave Goodell wrote:
>> Was that the option to MPD's mpiexec that said "don't launch any processes on the local node, even though the local node is in the MPD ring"?
>>
>> If so, then hydra just doesn't need such an option.  Simply don't include the local/head node in the machinefile and hydra won't launch any processes there.
>>
>> Or are you trying to obtain the effect of setting the "MPICH_NOLOCAL" environment variable to "1"?  That says don't use shared memory to communicate between processes on the same node.
> The mpd option is closer to what I am looking for but still not it because I do want to be able to start up a process on the
> local node also.
>
> For example, with a machines file:
>
> 00
> 01
> 02
>
> I want mpiexec to blindly call my launcher with the machine names of 00, 01, and 02 without trying to resolve the names (of
> course, 00, 01, 02 are not hostnames). So, in effect, my machine names are really just labels which the launcher will interpret.
> The problem is, mpiexec wants to resolve the entries in the machines file, expecting that they are hostnames.
>
> My change simply forces an is_local = 0 for all names. Is there an alternative?
>
> Thanks,
> John
>
>> -Dave
>>
>> On Jul 8, 2011, at 4:29 PM CDT, John Marshall wrote:
>>
>>> Hi,
>>>
>>>   From what I can tell, there is no longer a nolocal option. For what I am doing, I currently need this kind of functionality since the entries in my "machines" list are not actual machine names but labels. I have made a quick change to src/pm/hydra/utils/sock/sock.c so that if an env var is set, all machines are treated as non-local (*is_local = 0).
>>>
>>> I know I'm late to the party on this, but can someone explain why the -nolocal option was removed. Or, maybe I have missed something to get this functionality, i.e., to pass the machine name/label to the launcher as is without any complaints/errors and let the launcher interpret.
>>>
>>> Thanks,
>>> John
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich2-dev mailing list