[mpich2-dev] Parent terminates when the spawned child terminates

Pavan Balaji balaji at mcs.anl.gov
Wed Dec 15 10:34:42 CST 2010


The standard leaves it pretty open, saying that an error in one 
application *may* affect the other. What exactly is done depends on the 
MPI implementation.

In the latest version of MPICH2, you can ask MPI to not terminate the 
remaining processes by passing the -disable-auto-cleanup flag to 
mpiexec. Sending data to dead processes will (obviously) fail, but you 
can still communicate with the remaining processes.

The catch is that if you are trying to receive data from a process, and 
the process dies even before it establishes a connection, the process 
waiting to receive data will not know about this. We are working on 
fixing this and other corner cases for the next release, but the current 
release should still be usable for most common cases.

  -- Pavan

On 12/15/2010 10:27 AM, Lisandro Dalcin wrote:
> On 15 December 2010 13:18, Suraj Prabhakaran
> <suraj.prabhakaran at gmail.com>  wrote:
>> Hello,
>>
>> By default, when a spawned child terminates (through exit() or mpi_abort()
>> and *NOT* throught MPI_finalize() ), the parent also terminates with a
>> message
>>
>> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
>>
>> I am not sure what exactly the standard specifies for such a situation but
>> it kind of defeats the purpose of parent-child relationship.
>>
>
> I think the standard is pretty clear about this:
> http://www.mpi-forum.org/docs/mpi22-report/node226.htm#Node226
>
>> I would be glad
>> to know if there is any specific reason why it is implemented this way and
>> why it should stay this way. If there is no specific reason, may I request
>> that a work around for this is implemented?
>>
>
> Did you try to use register an atexit() callback [and a SIGABRT
> handler for the case of abort()]?
>
>
>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich2-dev mailing list