[mpich-discuss] Hydra: Sorting node list by number of CPUs
Yauheni Zelenko
zelenko at cadence.com
Thu Nov 4 20:17:29 CDT 2010
Hi!
I did simple node list sorting and it looks working as intended for LSF.
Implantation is based on 1.3. I didn't add command line option processing, because Hydra looks lacking command line option without parameter support. There are also some debug prints in HYD_sort_node_list().
Please review my code and include in trunk version with necessary modifications.
There are also minor memory leak:
==10665== 5 bytes in 1 blocks are definitely lost in loss record 2 of 3
==10665== at 0x48C0C4A: malloc (vg_replace_malloc.c:236)
==10665== by 0x3A48DF: strdup (in /lib/tls/libc-2.3.4.so)
==10665== by 0x80679CF: HYDT_bind_init (bind.c:40)
==10665== by 0x804FBE3: HYD_pmci_launch_procs (pmiserv_pmci.c:293)
==10665== by 0x804AFDB: main (mpiexec.c:344)
Eugene.
static int node_list_compare(const void *_a, const void *_b)
{
const struct HYD_node *a = *((struct HYD_node**) _a);
const struct HYD_node *b = *((struct HYD_node**) _b);
return ((a->core_count > b->core_count) ? 1 : ((a->core_count == b->core_count) ? 0 : -1));
}
static HYD_status HYD_sort_node_list(void)
{
HYD_status status = HYD_SUCCESS;
struct HYD_node *node;
int node_count = 0;
struct HYD_node **nodes;
int i;
printf("Before sorting\n");
for (node = HYD_handle.node_list; node; node = node->next)
printf("%s\t%d\n", node->hostname, node->core_count);
for (node = HYD_handle.node_list; node; node = node->next)
++node_count;
HYDU_MALLOC(nodes, struct HYD_node**, (node_count * sizeof(struct HYD_node*)), status);
i = 0;
for (node = HYD_handle.node_list; node; node = node->next)
{
nodes[i] = node;
++i;
}
qsort(nodes, node_count, sizeof(struct HYD_node*), node_list_compare);
HYD_handle.node_list = NULL;
for (i = 0; i < node_count; ++i)
{
nodes[i]->next = HYD_handle.node_list;
HYD_handle.node_list = nodes[i];
}
printf("After sorting\n");
for (node = HYD_handle.node_list; node; node = node->next)
printf("%s\t%d\n", node->hostname, node->core_count);
HYDU_FREE(nodes);
fn_exit:
return status;
fn_fail:
goto fn_exit;
}
int main(int argc, char **argv)
{
...
if (HYD_handle.node_list == NULL) {
/* Node list is not created yet. The user might not have
* provided the host file. Query the RMK. */
status = HYDT_rmki_query_node_list(&HYD_handle.node_list);
HYDU_ERR_POP(status, "unable to query the RMK for a node list\n");
if (HYD_handle.node_list == NULL) {
/* didn't get anything from the RMK; try the bootstrap server */
status = HYDT_bsci_query_node_list(&HYD_handle.node_list);
HYDU_ERR_POP(status, "bootstrap returned error while querying node list\n");
}
if (HYD_handle.node_list == NULL) {
/* The RMK and bootstrap didn't give us anything back; use localhost */
status = HYDU_add_to_node_list("localhost", 1, &HYD_handle.node_list);
HYDU_ERR_POP(status, "unable to add to node list\n");
}
}
/* Reset the host list to use only the number of processes per
* node as specified by the ppn option. */
if (HYD_handle.ppn != -1)
for (node = HYD_handle.node_list; node; node = node->next)
node->core_count = HYD_handle.ppn;
HYD_handle.global_core_count = 0;
HYD_sort_node_list();
________________________________________
From: mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Yauheni Zelenko [zelenko at cadence.com]
Sent: Wednesday, October 20, 2010 6:12 PM
To: Pavan Balaji; mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Hydra: Sorting node list by number of CPUs
From: Pavan Balaji [balaji at mcs.anl.gov]
Sent: Wednesday, October 20, 2010 6:11 PM
To: mpich-discuss at mcs.anl.gov
Cc: Yauheni Zelenko
Subject: Re: [mpich-discuss] Hydra: Sorting node list by number of CPUs
>> Unfortunately< I don't have time to do this myself in nearest days,
>> so may be somebody else will be interested to implement, so this may
>> be included in 1.3 release?
> This is too intrusive to go into the 1.3 release (we don't want to break
> something by rushing a feature in). We are wrapping up stuff to push out
> 1.3 as soon as possible. However, I can give you a patch as soon as 1.3
> is released that'll provide this feature.
> -- Pavan
It'll be great! Thank you!
Eugene.
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list