[mpich-discuss] Fix for Hydra LSF resources query

Yauheni Zelenko zelenko at cadence.com
Fri Jul 30 20:02:31 CDT 2010


Hi!

I want to propose fix for HYDT_bscd_lsf_query_node_list(). Currently tokenizer operates directly on environment and corrupt it (application gets truncated LSB_MCPU_HOSTS). So it's necessary to create copy of environment and work with it.

I suggest next implementation (based on r6885):

HYD_status HYDT_bscd_lsf_query_node_list(struct HYD_node **node_list)
{
    char *hosts;
    HYD_status status = HYD_SUCCESS;

    HYDU_FUNC_ENTER();

    if (MPL_env2str("LSB_MCPU_HOSTS", (const char **) &hosts) == 0)
        hosts = NULL;

    if (hosts == NULL) {
        *node_list = NULL;
        HYDU_ERR_SETANDJUMP(status, HYD_INTERNAL_ERROR, "No LSF node list found\n");
    }
    else {
	char* hosts_copy = HYDU_strdup(hosts);
        char* hostname = strtok(hosts_copy, " ");

        while (1) {
	    char* num_procs_str;
	    int num_procs;

            if (hostname == NULL)
                break;

            /* the even fields in the list should be the number of
             * cores */
            num_procs_str = strtok(NULL, " ");
            HYDU_ASSERT(num_procs_str, status);

            num_procs = atoi(num_procs_str);

            status = HYDU_add_to_node_list(hostname, num_procs, node_list);
            HYDU_ERR_POP(status, "unable to add to node list\n");

            hostname = strtok(NULL, " ");
        }
	HYDU_free(hosts_copy);
    }

fn_exit:
    HYDU_FUNC_EXIT();
    return status;

fn_fail:
    goto fn_exit;
}

Please review it and include to Hydra code.

Eugene.


More information about the mpich-discuss mailing list