[mpich2-dev] mpiexec to call launcher for all names in machines file without hostname resolution (was -nolocal option)
John Marshall
John.Marshall at ec.gc.ca
Fri Jul 8 21:56:13 CDT 2011
On 07/08/2011 10:31 PM, Pavan Balaji wrote:
> Hi John,
>
> Can you try this patch: http://pastebin.com/09iC9PdD
>
> There were two reasons for the local host check: (1) to figure out which cases we can avoid doing an ssh and instead just use
> a fork, and (2) workaround for cases where the default hostname of a node is not accessible over the network (in this case we
> try to find the "local hostname" in the host list passed by the user).
>
> While the first one is only a performance optimization, the second is a correctness issue. So we can't just disable the local
> test the way you did without breaking Hydra on some platforms. Instead, I have tried to handle this issue by not throwing a
> failure when a hostname doesn't resolve, and instead just assuming that it's not the local host.
>
> This is not a perfect solution as this will not work in cases where the second point above is true && the user wants to use
> aliases instead of regular host names. But it might be OK for cases where one or both of the above two conditions is false.
Hi,
I suspect that this is a worthwhile change to make in general. However, I'd want to go even further by being able to expressly
tell hydra to not do any hostname resolution. That way there is no question about what to do.
I have something like the following (in the same sock.c file you mention):
530,538d529
< /* JM - start */
< char *HYDRA_NO_LOCAL_ENV;
< HYDRA_NO_LOCAL_ENV = getenv("HYDRA_NO_LOCAL");
< if ((HYDRA_NO_LOCAL_ENV != NULL)&& (strcmp(HYDRA_NO_LOCAL_ENV, "1") == 0)) {
< *is_local = 0;
< goto fn_exit;
< }
< /* JM - end */
<
with an 'export HYDRA_NO_LOCAL=1', all machine names are treated as not the local host, and thereby passed on to the launcher.
Would this actually mess anything up elsewhere in the code?
Given that the -nolocal is not what I thought, a name other than HYDRA_NO_LOCAL would be in order.
John
>
> -- Pavan
>
> On 07/08/2011 05:03 PM, John Marshall wrote:
>> On 07/08/2011 05:35 PM, Dave Goodell wrote:
>>> Was that the option to MPD's mpiexec that said "don't launch any processes on the local node, even though the local node is
>>> in the MPD ring"?
>>>
>>> If so, then hydra just doesn't need such an option. Simply don't include the local/head node in the machinefile and hydra
>>> won't launch any processes there.
>>>
>>> Or are you trying to obtain the effect of setting the "MPICH_NOLOCAL" environment variable to "1"? That says don't use
>>> shared memory to communicate between processes on the same node.
>> The mpd option is closer to what I am looking for but still not it because I do want to be able to start up a process on the
>> local node also.
>>
>> For example, with a machines file:
>>
>> 00
>> 01
>> 02
>>
>> I want mpiexec to blindly call my launcher with the machine names of 00, 01, and 02 without trying to resolve the names (of
>> course, 00, 01, 02 are not hostnames). So, in effect, my machine names are really just labels which the launcher will interpret.
>> The problem is, mpiexec wants to resolve the entries in the machines file, expecting that they are hostnames.
>>
>> My change simply forces an is_local = 0 for all names. Is there an alternative?
>>
>> Thanks,
>> John
>>
>>> -Dave
>>>
>>> On Jul 8, 2011, at 4:29 PM CDT, John Marshall wrote:
>>>
>>>> Hi,
>>>>
>>>> From what I can tell, there is no longer a nolocal option. For what I am doing, I currently need this kind of
>>>> functionality since the entries in my "machines" list are not actual machine names but labels. I have made a quick change
>>>> to src/pm/hydra/utils/sock/sock.c so that if an env var is set, all machines are treated as non-local (*is_local = 0).
>>>>
>>>> I know I'm late to the party on this, but can someone explain why the -nolocal option was removed. Or, maybe I have missed
>>>> something to get this functionality, i.e., to pass the machine name/label to the launcher as is without any
>>>> complaints/errors and let the launcher interpret.
>>>>
>>>> Thanks,
>>>> John
>>
>
More information about the mpich2-dev
mailing list