[mpich2-dev] mpiexec to call launcher for all names in machines file without hostname resolution (was -nolocal option)

Pavan Balaji balaji at mcs.anl.gov
Fri Jul 8 22:30:57 CDT 2011


John,

Hydra will still pass the hostnames directly to the launcher; nothing 
gets changed by Hydra. The resolution of the hostname is used for its 
internal purposes.

And a machinefile containing "localhost" will work fine.

  -- Pavan

On 07/08/2011 10:28 PM, John Marshall wrote:
> On 07/08/2011 11:07 PM, Pavan Balaji wrote:
>> Hi John,
>>
>> What's the benefit of allowing the user to disable this? If Hydra cannot resolve a hostname, it'll anyway consider it to be
>> non-local. Note that your fix still doesn't solve the issue for the case I mentioned below.
>>
>> Btw, did you try the patch I provided?
> I looked at your patch and for the case you mention it works. But it does not address my need because it still tests
> gethostbyname(host). I don't want hydra to even bother checking that the machine names are valid hostnames. For example, with a
> machines file (intentionally meant to thwart the patch) of:
>
> localhost
>
> your patch will resolve the name, but I need that name to be passed to the launcher-exec, no questions asked. The names in the
> machines file that I want to use are simply labels, they are not hostnames. The possibility that these labels may match the
> names of some hosts on the network is unintended and meant to be irrelevant.
>
> Thanks,
> John
>
>>
>>   -- Pavan
>>
>> On 07/08/2011 09:56 PM, John Marshall wrote:
>>> On 07/08/2011 10:31 PM, Pavan Balaji wrote:
>>>> Hi John,
>>>>
>>>> Can you try this patch: http://pastebin.com/09iC9PdD
>>>>
>>>> There were two reasons for the local host check: (1) to figure out which cases we can avoid doing an ssh and instead just use
>>>> a fork, and (2) workaround for cases where the default hostname of a node is not accessible over the network (in this case we
>>>> try to find the "local hostname" in the host list passed by the user).
>>>>
>>>> While the first one is only a performance optimization, the second is a correctness issue. So we can't just disable the local
>>>> test the way you did without breaking Hydra on some platforms. Instead, I have tried to handle this issue by not throwing a
>>>> failure when a hostname doesn't resolve, and instead just assuming that it's not the local host.
>>>>
>>>> This is not a perfect solution as this will not work in cases where the second point above is true&&   the user wants to use
>>>> aliases instead of regular host names. But it might be OK for cases where one or both of the above two conditions is false.
>>> Hi,
>>>
>>> I suspect that this is a worthwhile change to make in general. However, I'd want to go even further by being able to expressly
>>> tell hydra to not do any hostname resolution. That way there is no question about what to do.
>>>
>>> I have something like the following (in the same sock.c file you mention):
>>>
>>> 530,538d529
>>> <    /* JM - start */
>>> <        char *HYDRA_NO_LOCAL_ENV;
>>> <        HYDRA_NO_LOCAL_ENV = getenv("HYDRA_NO_LOCAL");
>>> <        if ((HYDRA_NO_LOCAL_ENV != NULL)&&    (strcmp(HYDRA_NO_LOCAL_ENV, "1") == 0)) {
>>> <            *is_local = 0;
>>> <            goto fn_exit;
>>> <        }
>>> <    /* JM - end */
>>> <
>>>
>>> with an 'export HYDRA_NO_LOCAL=1', all machine names are treated as not the local host, and thereby passed on to the launcher.
>>> Would this actually mess anything up elsewhere in the code?
>>>
>>> Given that the -nolocal is not what I thought, a name other than HYDRA_NO_LOCAL would be in order.
>>>
>>> John
>>>
>>>>
>>>>    -- Pavan
>>>>
>>>> On 07/08/2011 05:03 PM, John Marshall wrote:
>>>>> On 07/08/2011 05:35 PM, Dave Goodell wrote:
>>>>>> Was that the option to MPD's mpiexec that said "don't launch any processes on the local node, even though the local node is
>>>>>> in the MPD ring"?
>>>>>>
>>>>>> If so, then hydra just doesn't need such an option.  Simply don't include the local/head node in the machinefile and hydra
>>>>>> won't launch any processes there.
>>>>>>
>>>>>> Or are you trying to obtain the effect of setting the "MPICH_NOLOCAL" environment variable to "1"?  That says don't use
>>>>>> shared memory to communicate between processes on the same node.
>>>>> The mpd option is closer to what I am looking for but still not it because I do want to be able to start up a process on the
>>>>> local node also.
>>>>>
>>>>> For example, with a machines file:
>>>>>
>>>>> 00
>>>>> 01
>>>>> 02
>>>>>
>>>>> I want mpiexec to blindly call my launcher with the machine names of 00, 01, and 02 without trying to resolve the names (of
>>>>> course, 00, 01, 02 are not hostnames). So, in effect, my machine names are really just labels which the launcher will
>>>>> interpret.
>>>>> The problem is, mpiexec wants to resolve the entries in the machines file, expecting that they are hostnames.
>>>>>
>>>>> My change simply forces an is_local = 0 for all names. Is there an alternative?
>>>>>
>>>>> Thanks,
>>>>> John
>>>>>
>>>>>> -Dave
>>>>>>
>>>>>> On Jul 8, 2011, at 4:29 PM CDT, John Marshall wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>     From what I can tell, there is no longer a nolocal option. For what I am doing, I currently need this kind of
>>>>>>> functionality since the entries in my "machines" list are not actual machine names but labels. I have made a quick change
>>>>>>> to src/pm/hydra/utils/sock/sock.c so that if an env var is set, all machines are treated as non-local (*is_local = 0).
>>>>>>>
>>>>>>> I know I'm late to the party on this, but can someone explain why the -nolocal option was removed. Or, maybe I have missed
>>>>>>> something to get this functionality, i.e., to pass the machine name/label to the launcher as is without any
>>>>>>> complaints/errors and let the launcher interpret.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> John
>>>>>
>>>>
>>>
>>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich2-dev mailing list