[mpich2-dev] Hydra 1.4 without HYDU_local_to_global_id(...) function

Cody R. Brown cody at cs.ubc.ca
Tue Aug 9 23:07:00 CDT 2011


Hi Pavan;

That clarifies a lot. How the function works in pmip_cb.c is what I
was looking.  Thanks again.

--
Cody R. Brown, M.Sc. Student
  UBC Department of Computer Science
  201-2366 Main Mall, Vancouver, BC, V6T 1Z4
  Office: ICCS x409      http://www.codybrown.ca/



On Tue, Aug 9, 2011 at 8:38 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> Hi Cody,
>
> Sorry for the delay in responding. I had to go through the code again to
> refresh my memory.
>
> The code was restructured to handle dynamic processes more cleanly. To
> understand the code, there are two pieces I'll need to explain:
>
> 1. The available cores in the system (also known as global core map): this
> is viewed as a triplet for each proxy process. Suppose you have 4 nodes,
> with 1, 3, 4, 2 cores respectively, the first process views the system as
> (0,1,9) cores (meaning 0 cores on nodes before me, 1 core on my node and 9
> cores on nodes after me). This gives all the cores on the nodes that this
> set of processes will be using.
>
> 2. Filler processes is a little bit more tricky. The idea is that, if my
> currently running application uses 12 cores, in the above core layout, it'll
> use all four nodes with the following core counts:
>
> node1: 1 core
> node2: 3 cores
> node3: 4 cores
> node4: 2 cores
> node1: 1 core
> node2: 2 cores
>
> (it wraps back to the first node after reaching the end of the list).
>
> Now, if a new set of 4 dynamic processes needs to be spawned, we try to load
> balance these processes on the remaining nodes. So, to even out all nodes,
> we have 1 core on node2, 4 cores on node3 and 2 cores on node4 left where
> processes can be "filled in" to make sure that all nodes get the same number
> of processes per core.
>
> To answer your question about the rank detection, you can find the
> appropriate functionality in the function local_to_global_id() in
> pm/pmiserv/pmip_cb.c.
>
> Also, HYD_server_info.global_core_count was removed and replaced with
> pg->pg_core_count, which represents the number of cores on the nodes used by
> the process group.
>
>  -- Pavan
>
> On 08/03/2011 06:27 PM, Cody R. Brown wrote:
>>
>> Hello;
>>
>> I am attempting to get the global id (rank) of the starting process in
>> each proxy in the mpiexec.c (hydra) code, similar to how the
>> "print_rank_map" function worked in the 1.3.2p1 hydra source. I used
>> the function:
>> HYDU_local_to_global_id(process_id,proxy->start_pid,proxy->node.core_count,HYD_server_info.global_core_count)
>> to achieve this in the past.
>>
>> Upgrading the code to Hydra 1.4, I noticed the HYDU_local_to_global_id
>> function was removed, and some of the information in the proxy
>> structure (such as the proxy->start_pid) was removed (or
>> renamed/moved). The equation the function used is below:
>>     return ((local_id / core_count) * global_core_count) + (local_id %
>> core_count) + start_pid;
>>
>> I managed to use the following equation to get the rank. However I
>> have to calculate the start_pid along the way:
>>     return =
>> ((process_id/proxy->filler_processes)*HYD_server_info.pg_list.pg_core_count)+(process_id
>> % proxy->filler_processes)+start_pid;
>>
>> My questions are:
>>  Is the start_pid (use to be stored in proxy->start_pid) somewhere
>> else? Or should I be calculating it as I go through the proxy list
>> like I am doing now?
>>  is the proxy->filler_processes (hydra 1.4) similar to the
>> proxy->node.core_count? The node structure doesn't seem to be
>> initialized at the location I am calculating the rank anymore.
>>  Is the HYD_server_info.pg_list.pg_core_count (1.4) the same as
>> HYD_server_info.global_core_count (1.3.2p1) which is no longer there?
>>
>> Thanks for any help you may have.
>>
>> --
>> Cody R. Brown, M.Sc. Student
>>   UBC Department of Computer Science
>>   201-2366 Main Mall, Vancouver, BC, V6T 1Z4
>>   Office: ICCS x409      http://www.codybrown.ca/
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
>


More information about the mpich2-dev mailing list