[mpich2-dev] mpiexec to call launcher for all names in machines file without hostname resolution (was -nolocal option)

John Marshall John.Marshall at ec.gc.ca
Fri Jul 8 23:11:15 CDT 2011


On 07/08/2011 11:30 PM, Pavan Balaji wrote:
> John,
>
> Hydra will still pass the hostnames directly to the launcher; nothing gets changed by Hydra. The resolution of the hostname is 
> used for its internal purposes.
>
> And a machinefile containing "localhost" will work fine.
I tried your patch and it does not work. When localhost is in the list of machine names, my launcher is not called. From what I 
can tell:

    * ht = gethostbyname("localhost") succeeds
    * ht resolves to a local interface
    * *is_local = 1 is executed
    * goto fn_exit is executed

Perhaps there is some confusion here, but I don't see how my launcher could be called since *is_local == 1.

John
>
>  -- Pavan
>
> On 07/08/2011 10:28 PM, John Marshall wrote:
>> On 07/08/2011 11:07 PM, Pavan Balaji wrote:
>>> Hi John,
>>>
>>> What's the benefit of allowing the user to disable this? If Hydra cannot resolve a hostname, it'll anyway consider it to be
>>> non-local. Note that your fix still doesn't solve the issue for the case I mentioned below.
>>>
>>> Btw, did you try the patch I provided?
>> I looked at your patch and for the case you mention it works. But it does not address my need because it still tests
>> gethostbyname(host). I don't want hydra to even bother checking that the machine names are valid hostnames. For example, with a
>> machines file (intentionally meant to thwart the patch) of:
>>
>> localhost
>>
>> your patch will resolve the name, but I need that name to be passed to the launcher-exec, no questions asked. The names in the
>> machines file that I want to use are simply labels, they are not hostnames. The possibility that these labels may match the
>> names of some hosts on the network is unintended and meant to be irrelevant.
>>
>> Thanks,
>> John
>>
>>>
>>>   -- Pavan
>>>
>>> On 07/08/2011 09:56 PM, John Marshall wrote:
>>>> On 07/08/2011 10:31 PM, Pavan Balaji wrote:
>>>>> Hi John,
>>>>>
>>>>> Can you try this patch: http://pastebin.com/09iC9PdD
>>>>>
>>>>> There were two reasons for the local host check: (1) to figure out which cases we can avoid doing an ssh and instead just use
>>>>> a fork, and (2) workaround for cases where the default hostname of a node is not accessible over the network (in this case we
>>>>> try to find the "local hostname" in the host list passed by the user).
>>>>>
>>>>> While the first one is only a performance optimization, the second is a correctness issue. So we can't just disable the local
>>>>> test the way you did without breaking Hydra on some platforms. Instead, I have tried to handle this issue by not throwing a
>>>>> failure when a hostname doesn't resolve, and instead just assuming that it's not the local host.
>>>>>
>>>>> This is not a perfect solution as this will not work in cases where the second point above is true&&   the user wants to use
>>>>> aliases instead of regular host names. But it might be OK for cases where one or both of the above two conditions is false.
>>>> Hi,
>>>>
>>>> I suspect that this is a worthwhile change to make in general. However, I'd want to go even further by being able to expressly
>>>> tell hydra to not do any hostname resolution. That way there is no question about what to do.
>>>>
>>>> I have something like the following (in the same sock.c file you mention):
>>>>
>>>> 530,538d529
>>>> <    /* JM - start */
>>>> <        char *HYDRA_NO_LOCAL_ENV;
>>>> <        HYDRA_NO_LOCAL_ENV = getenv("HYDRA_NO_LOCAL");
>>>> <        if ((HYDRA_NO_LOCAL_ENV != NULL)&&    (strcmp(HYDRA_NO_LOCAL_ENV, "1") == 0)) {
>>>> <            *is_local = 0;
>>>> <            goto fn_exit;
>>>> <        }
>>>> <    /* JM - end */
>>>> <
>>>>
>>>> with an 'export HYDRA_NO_LOCAL=1', all machine names are treated as not the local host, and thereby passed on to the launcher.
>>>> Would this actually mess anything up elsewhere in the code?
>>>>
>>>> Given that the -nolocal is not what I thought, a name other than HYDRA_NO_LOCAL would be in order.
>>>>
>>>> John
>>>>
>>>>>
>>>>>    -- Pavan
>>>>>
>>>>> On 07/08/2011 05:03 PM, John Marshall wrote:
>>>>>> On 07/08/2011 05:35 PM, Dave Goodell wrote:
>>>>>>> Was that the option to MPD's mpiexec that said "don't launch any processes on the local node, even though the local node is
>>>>>>> in the MPD ring"?
>>>>>>>
>>>>>>> If so, then hydra just doesn't need such an option.  Simply don't include the local/head node in the machinefile and hydra
>>>>>>> won't launch any processes there.
>>>>>>>
>>>>>>> Or are you trying to obtain the effect of setting the "MPICH_NOLOCAL" environment variable to "1"?  That says don't use
>>>>>>> shared memory to communicate between processes on the same node.
>>>>>> The mpd option is closer to what I am looking for but still not it because I do want to be able to start up a process on the
>>>>>> local node also.
>>>>>>
>>>>>> For example, with a machines file:
>>>>>>
>>>>>> 00
>>>>>> 01
>>>>>> 02
>>>>>>
>>>>>> I want mpiexec to blindly call my launcher with the machine names of 00, 01, and 02 without trying to resolve the names (of
>>>>>> course, 00, 01, 02 are not hostnames). So, in effect, my machine names are really just labels which the launcher will
>>>>>> interpret.
>>>>>> The problem is, mpiexec wants to resolve the entries in the machines file, expecting that they are hostnames.
>>>>>>
>>>>>> My change simply forces an is_local = 0 for all names. Is there an alternative?
>>>>>>
>>>>>> Thanks,
>>>>>> John
>>>>>>
>>>>>>> -Dave
>>>>>>>
>>>>>>> On Jul 8, 2011, at 4:29 PM CDT, John Marshall wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>     From what I can tell, there is no longer a nolocal option. For what I am doing, I currently need this kind of
>>>>>>>> functionality since the entries in my "machines" list are not actual machine names but labels. I have made a quick change
>>>>>>>> to src/pm/hydra/utils/sock/sock.c so that if an env var is set, all machines are treated as non-local (*is_local = 0).
>>>>>>>>
>>>>>>>> I know I'm late to the party on this, but can someone explain why the -nolocal option was removed. Or, maybe I have missed
>>>>>>>> something to get this functionality, i.e., to pass the machine name/label to the launcher as is without any
>>>>>>>> complaints/errors and let the launcher interpret.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20110709/9105a45a/attachment.htm>


More information about the mpich2-dev mailing list