[mpich-discuss] Hydra process clean up

Tue Jun 19 16:44:21 CDT 2012

I am using mpich2-1.4.1 linux version. Since I moved to hydra (from mpd), I am often seeing orphaned processes, when ctlr-c is done.

This is my call sequence:
main_exec -> mpiexec -> proc1 -> subp1
                                          -> proc2 -> subp2

When I do ctrl-c, I "don't" see this message 'Ctrl-C caught... cleaning up processes', and often one of the subp* is left around.

I tried doing this call sequence too:
mpiexec -> proc1 -> subp1
                -> proc2 -> subp2

I "do" see message 'Ctrl-C caught... cleaning up processes', but one of the subp is still left around.

I can also call my executable in single core mode:
proc -> subp

With ctrl-c, it does trap SIGINT, and pass it down to subp. But, with mpiexec involved, I don't see 'proc' or 'subp' trapping any signal.

Questions:
1- What mpiexec is really doing when it says 'cleaning up processes'? 
2- What signal should be trapped in proc (child processes) for ctrl-c, while running through mpiexec? Is there a way to dump signal tracing in MPI code?
3- Is mpiexec supposed to terminate whole process tree, including sub-processes?	
4- mpd used to cleanup whole process tree. Is this intentional change in hydra?

Regards,
Rohit

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Martin Cuma
Sent: Friday, May 25, 2012 7:11 AM
To: Guillaume Mercier
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Hydra process affinity

Hi Guillaume,

sure, I agree that the meaing may be different, though I would encourage
MPICH2 and OpenMPI to coordinate in matters like this. Furthermore, I would argue that the point of binding to socket is to then launch multiple threads on that socket from the MPI process it's on, so, spanning multiple cores on that socket is desireable. If you want to bind the process to a single core, then use the -binding cpu:cores instead.

Perhaps the thinking was to use the cpu:sockets to round-robin between the sockets while cpu:cores would distribute the processes consecutively? 
That's fine, but in that case it would still be useful to have another option to allow processes to bind to multiple cores.

I have thought that -binding cache:l3 would do that, i.e. pick all the cores on the socket, as they share the L3. However, that is not the case:
/uufs/chpc.utah.edu/sys/pkg/mpich2/1.5b1/bin/mpirun -binding cache:l3 -np
2 /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
0x00000002
0x00000001

It seems to me that you basically don't consider the possibility of the single process to run more than one thread when binding is in place. While this is useful for single-threaded processes, it's not good for multi-threading.

Thanks,
MC

On Fri, 25 May 2012, Guillaume Mercier wrote:

>
> Hello,
>
> Pavan will probably confirm this point, but my guess is that the 
> meaning of "binding to sockets"
> differs from one implementation to the other.
> If you take a look at the Hydra documentation, you will see that the 
> behaviour you're experiencing seems to be correct.  But I agree that 
> it's not the one you want, though.
>
> Regards
> Guillaume
>
>
> On 05/25/2012 01:14 AM, Martin Cuma wrote:
>>  Hello,
>>
>>  I am trying to get consistent performance on dual socket multi-core 
>> nodes  which requires process binding to the socket. The core 
>> generally runs one  process per socket and launches multiple OpenMP 
>> threads to fill up the  socket's cores.
>>
>>  I find a problem with the -binding cpu:sockets in the Hydra's mpirun  
>> implementation - it binds the process to the first core on the 
>> socket,  rather than having it access all the socket's cores.
>>
>>  For example, on a dual socket, 6 core CPU (12 cores total) node, I get:
>>  /uufs/chpc.utah.edu/sys/pkg/mpich2/1.5b1/bin/mpirun -binding 
>> cpu:sockets  -np 2 
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get
>>  0x00000001
>>  0x00000002
>>  /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p 
>> --hierarchical  socket.core 0x00000001
>>  Socket:0.Core:0
>>  /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p 
>> --hierarchical  socket.core 0x00000002
>>  Socket:1.Core:0
>>
>>  I am using hwloc to report the binding and then calculating the "physical"
>>  location. Notice that only single core is reported for each socket, 
>> rather  than reporting all 6 cores. This is verified by running with 
>> 6 OpenMP  threads per process and getting only 100% CPU load, rather than 600%.
>>
>>  Now, OpenMPI 1.6.1a1 (just added in today) is doing the affinity right as:
>>  /uufs/ember.arches/sys/pkg/openmpi/1.6.1a1i/bin/mpirun --bysocket  
>> --bind-to-socket -np 2  
>> /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-bind --get  
>> 0x00aaaaaa
>>  0x00555555
>>  /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p 
>> --hierarchical  socket.core 0x00aaaaaa
>> Socket: 1.Core:0 Socket:1.Core:1 Socket:1.Core:2 Socket:1.Core:8 
>> 1.Core:9
>> Socket: Socket:1.Core:10
>>  /uufs/chpc.utah.edu/sys/pkg/hwloc/1.4.2/bin/hwloc-calc -p 
>> --hierarchical  socket.core 0x00555555
>> Socket: 0.Core:0 Socket:0.Core:1 Socket:0.Core:2 Socket:0.Core:8 
>> 0.Core:9
>> Socket: Socket:0.Core:10
>>
>>  I have a suspicion that other binding options may be broken as well, 
>> for  example, I was trying to run -binding cache:l3, which should 
>> have the same  efect as cpu:sockets on this machine, and it only ran 
>> on one core as well  (100% load).
>>
>>  I would appreciate if someone could comment on this and if this is a 
>> bug,  I'd be happy to work with the developers to get this fixed. 
>> This shows up  both in 1.4.1p1 and 1.5b1.
>>
>>  Thanks,
>>  MC
>> 
>> 
>
>

--
Martin Cuma
Center for High Performance Computing
University of Utah
_______________________________________________
mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
To manage subscription options or unsubscribe:
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss