[mpich-discuss] Fix for Hydra LSF resources query

Pavan Balaji balaji at mcs.anl.gov
Mon Aug 2 10:25:20 CDT 2010


The fix is committed in r6972 
[http://trac.mcs.anl.gov/projects/mpich2/changeset/6972]. The latest 
nightly snapshot 
[http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/trunk] 
should have it.

  -- Pavan

On 08/01/2010 03:19 PM, Pavan Balaji wrote:
> Hello,
>
> Thanks for the patch. I don't think duplicating the string makes much
> difference from a correctness perspective, since the environment
> propagation mechanism is completely different in Hydra, and reads the
> environment before modifying it. However, I do agree that it's a better
> programming practice in general to not corrupt the environment. I've
> fixed it in my local git repository and will commit it to the svn as
> soon as it's up.
>
>    -- Pavan
>
> On 07/30/2010 08:02 PM, Yauheni Zelenko wrote:
>> Hi!
>>
>> I want to propose fix for HYDT_bscd_lsf_query_node_list(). Currently tokenizer operates directly on environment and corrupt it (application gets truncated LSB_MCPU_HOSTS). So it's necessary to create copy of environment and work with it.
>>
>> I suggest next implementation (based on r6885):
>>
>> HYD_status HYDT_bscd_lsf_query_node_list(struct HYD_node **node_list)
>> {
>>       char *hosts;
>>       HYD_status status = HYD_SUCCESS;
>>
>>       HYDU_FUNC_ENTER();
>>
>>       if (MPL_env2str("LSB_MCPU_HOSTS", (const char **)&hosts) == 0)
>>           hosts = NULL;
>>
>>       if (hosts == NULL) {
>>           *node_list = NULL;
>>           HYDU_ERR_SETANDJUMP(status, HYD_INTERNAL_ERROR, "No LSF node list found\n");
>>       }
>>       else {
>> 	char* hosts_copy = HYDU_strdup(hosts);
>>           char* hostname = strtok(hosts_copy, " ");
>>
>>           while (1) {
>> 	    char* num_procs_str;
>> 	    int num_procs;
>>
>>               if (hostname == NULL)
>>                   break;
>>
>>               /* the even fields in the list should be the number of
>>                * cores */
>>               num_procs_str = strtok(NULL, " ");
>>               HYDU_ASSERT(num_procs_str, status);
>>
>>               num_procs = atoi(num_procs_str);
>>
>>               status = HYDU_add_to_node_list(hostname, num_procs, node_list);
>>               HYDU_ERR_POP(status, "unable to add to node list\n");
>>
>>               hostname = strtok(NULL, " ");
>>           }
>> 	HYDU_free(hosts_copy);
>>       }
>>
>> fn_exit:
>>       HYDU_FUNC_EXIT();
>>       return status;
>>
>> fn_fail:
>>       goto fn_exit;
>> }
>>
>> Please review it and include to Hydra code.
>>
>> Eugene.
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list