[mpich-discuss] Fix for Hydra LSF resources query

Pavan Balaji balaji at mcs.anl.gov
Sun Aug 1 15:19:03 CDT 2010


Hello,

Thanks for the patch. I don't think duplicating the string makes much 
difference from a correctness perspective, since the environment 
propagation mechanism is completely different in Hydra, and reads the 
environment before modifying it. However, I do agree that it's a better 
programming practice in general to not corrupt the environment. I've 
fixed it in my local git repository and will commit it to the svn as 
soon as it's up.

  -- Pavan

On 07/30/2010 08:02 PM, Yauheni Zelenko wrote:
> Hi!
>
> I want to propose fix for HYDT_bscd_lsf_query_node_list(). Currently tokenizer operates directly on environment and corrupt it (application gets truncated LSB_MCPU_HOSTS). So it's necessary to create copy of environment and work with it.
>
> I suggest next implementation (based on r6885):
>
> HYD_status HYDT_bscd_lsf_query_node_list(struct HYD_node **node_list)
> {
>      char *hosts;
>      HYD_status status = HYD_SUCCESS;
>
>      HYDU_FUNC_ENTER();
>
>      if (MPL_env2str("LSB_MCPU_HOSTS", (const char **)&hosts) == 0)
>          hosts = NULL;
>
>      if (hosts == NULL) {
>          *node_list = NULL;
>          HYDU_ERR_SETANDJUMP(status, HYD_INTERNAL_ERROR, "No LSF node list found\n");
>      }
>      else {
> 	char* hosts_copy = HYDU_strdup(hosts);
>          char* hostname = strtok(hosts_copy, " ");
>
>          while (1) {
> 	    char* num_procs_str;
> 	    int num_procs;
>
>              if (hostname == NULL)
>                  break;
>
>              /* the even fields in the list should be the number of
>               * cores */
>              num_procs_str = strtok(NULL, " ");
>              HYDU_ASSERT(num_procs_str, status);
>
>              num_procs = atoi(num_procs_str);
>
>              status = HYDU_add_to_node_list(hostname, num_procs, node_list);
>              HYDU_ERR_POP(status, "unable to add to node list\n");
>
>              hostname = strtok(NULL, " ");
>          }
> 	HYDU_free(hosts_copy);
>      }
>
> fn_exit:
>      HYDU_FUNC_EXIT();
>      return status;
>
> fn_fail:
>      goto fn_exit;
> }
>
> Please review it and include to Hydra code.
>
> Eugene.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list