[mpich-discuss] Fix for Hydra LSF resources query

Yauheni Zelenko zelenko at cadence.com
Mon Aug 2 15:36:18 CDT 2010


Hi, Pavan!

I tested new build for this feature and also number of CPUs in -hosts and everything looks OK.

Eugene.
________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Pavan Balaji [balaji at mcs.anl.gov]
Sent: Monday, August 02, 2010 8:25 AM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Fix for Hydra LSF resources query

The fix is committed in r6972
[http://trac.mcs.anl.gov/projects/mpich2/changeset/6972]. The latest
nightly snapshot
[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/trunk]
should have it.

  -- Pavan

On 08/01/2010 03:19 PM, Pavan Balaji wrote:
> Hello,
>
> Thanks for the patch. I don't think duplicating the string makes much
> difference from a correctness perspective, since the environment
> propagation mechanism is completely different in Hydra, and reads the
> environment before modifying it. However, I do agree that it's a better
> programming practice in general to not corrupt the environment. I've
> fixed it in my local git repository and will commit it to the svn as
> soon as it's up.
>
>    -- Pavan
>
> On 07/30/2010 08:02 PM, Yauheni Zelenko wrote:
>> Hi!
>>
>> I want to propose fix for HYDT_bscd_lsf_query_node_list(). Currently tokenizer operates directly on environment and corrupt it (application gets truncated LSB_MCPU_HOSTS). So it's necessary to create copy of environment and work with it.
>>
>> I suggest next implementation (based on r6885):
>>
>> HYD_status HYDT_bscd_lsf_query_node_list(struct HYD_node **node_list)
>> {
>>       char *hosts;
>>       HYD_status status = HYD_SUCCESS;
>>
>>       HYDU_FUNC_ENTER();
>>
>>       if (MPL_env2str("LSB_MCPU_HOSTS", (const char **)&hosts) == 0)
>>           hosts = NULL;
>>
>>       if (hosts == NULL) {
>>           *node_list = NULL;
>>           HYDU_ERR_SETANDJUMP(status, HYD_INTERNAL_ERROR, "No LSF node list found\n");
>>       }
>>       else {
>>      char* hosts_copy = HYDU_strdup(hosts);
>>           char* hostname = strtok(hosts_copy, " ");
>>
>>           while (1) {
>>          char* num_procs_str;
>>          int num_procs;
>>
>>               if (hostname == NULL)
>>                   break;
>>
>>               /* the even fields in the list should be the number of
>>                * cores */
>>               num_procs_str = strtok(NULL, " ");
>>               HYDU_ASSERT(num_procs_str, status);
>>
>>               num_procs = atoi(num_procs_str);
>>
>>               status = HYDU_add_to_node_list(hostname, num_procs, node_list);
>>               HYDU_ERR_POP(status, "unable to add to node list\n");
>>
>>               hostname = strtok(NULL, " ");
>>           }
>>      HYDU_free(hosts_copy);
>>       }
>>
>> fn_exit:
>>       HYDU_FUNC_EXIT();
>>       return status;
>>
>> fn_fail:
>>       goto fn_exit;
>> }
>>
>> Please review it and include to Hydra code.
>>
>> Eugene.
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list