[mpich2-dev] Hydra 1.4 without HYDU_local_to_global_id(...) function
Pavan Balaji
balaji at mcs.anl.gov
Tue Aug 9 22:38:19 CDT 2011
Hi Cody,
Sorry for the delay in responding. I had to go through the code again to
refresh my memory.
The code was restructured to handle dynamic processes more cleanly. To
understand the code, there are two pieces I'll need to explain:
1. The available cores in the system (also known as global core map):
this is viewed as a triplet for each proxy process. Suppose you have 4
nodes, with 1, 3, 4, 2 cores respectively, the first process views the
system as (0,1,9) cores (meaning 0 cores on nodes before me, 1 core on
my node and 9 cores on nodes after me). This gives all the cores on the
nodes that this set of processes will be using.
2. Filler processes is a little bit more tricky. The idea is that, if my
currently running application uses 12 cores, in the above core layout,
it'll use all four nodes with the following core counts:
node1: 1 core
node2: 3 cores
node3: 4 cores
node4: 2 cores
node1: 1 core
node2: 2 cores
(it wraps back to the first node after reaching the end of the list).
Now, if a new set of 4 dynamic processes needs to be spawned, we try to
load balance these processes on the remaining nodes. So, to even out all
nodes, we have 1 core on node2, 4 cores on node3 and 2 cores on node4
left where processes can be "filled in" to make sure that all nodes get
the same number of processes per core.
To answer your question about the rank detection, you can find the
appropriate functionality in the function local_to_global_id() in
pm/pmiserv/pmip_cb.c.
Also, HYD_server_info.global_core_count was removed and replaced with
pg->pg_core_count, which represents the number of cores on the nodes
used by the process group.
-- Pavan
On 08/03/2011 06:27 PM, Cody R. Brown wrote:
> Hello;
>
> I am attempting to get the global id (rank) of the starting process in
> each proxy in the mpiexec.c (hydra) code, similar to how the
> "print_rank_map" function worked in the 1.3.2p1 hydra source. I used
> the function: HYDU_local_to_global_id(process_id,proxy->start_pid,proxy->node.core_count,HYD_server_info.global_core_count)
> to achieve this in the past.
>
> Upgrading the code to Hydra 1.4, I noticed the HYDU_local_to_global_id
> function was removed, and some of the information in the proxy
> structure (such as the proxy->start_pid) was removed (or
> renamed/moved). The equation the function used is below:
> return ((local_id / core_count) * global_core_count) + (local_id %
> core_count) + start_pid;
>
> I managed to use the following equation to get the rank. However I
> have to calculate the start_pid along the way:
> return = ((process_id/proxy->filler_processes)*HYD_server_info.pg_list.pg_core_count)+(process_id
> % proxy->filler_processes)+start_pid;
>
> My questions are:
> Is the start_pid (use to be stored in proxy->start_pid) somewhere
> else? Or should I be calculating it as I go through the proxy list
> like I am doing now?
> is the proxy->filler_processes (hydra 1.4) similar to the
> proxy->node.core_count? The node structure doesn't seem to be
> initialized at the location I am calculating the rank anymore.
> Is the HYD_server_info.pg_list.pg_core_count (1.4) the same as
> HYD_server_info.global_core_count (1.3.2p1) which is no longer there?
>
> Thanks for any help you may have.
>
> --
> Cody R. Brown, M.Sc. Student
> UBC Department of Computer Science
> 201-2366 Main Mall, Vancouver, BC, V6T 1Z4
> Office: ICCS x409 http://www.codybrown.ca/
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich2-dev
mailing list