[mpich2-dev] Parent terminates when the spawned child terminates
Pavan Balaji
balaji at mcs.anl.gov
Wed Dec 15 10:34:42 CST 2010
The standard leaves it pretty open, saying that an error in one
application *may* affect the other. What exactly is done depends on the
MPI implementation.
In the latest version of MPICH2, you can ask MPI to not terminate the
remaining processes by passing the -disable-auto-cleanup flag to
mpiexec. Sending data to dead processes will (obviously) fail, but you
can still communicate with the remaining processes.
The catch is that if you are trying to receive data from a process, and
the process dies even before it establishes a connection, the process
waiting to receive data will not know about this. We are working on
fixing this and other corner cases for the next release, but the current
release should still be usable for most common cases.
-- Pavan
On 12/15/2010 10:27 AM, Lisandro Dalcin wrote:
> On 15 December 2010 13:18, Suraj Prabhakaran
> <suraj.prabhakaran at gmail.com> wrote:
>> Hello,
>>
>> By default, when a spawned child terminates (through exit() or mpi_abort()
>> and *NOT* throught MPI_finalize() ), the parent also terminates with a
>> message
>>
>> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
>>
>> I am not sure what exactly the standard specifies for such a situation but
>> it kind of defeats the purpose of parent-child relationship.
>>
>
> I think the standard is pretty clear about this:
> http://www.mpi-forum.org/docs/mpi22-report/node226.htm#Node226
>
>> I would be glad
>> to know if there is any specific reason why it is implemented this way and
>> why it should stay this way. If there is no specific reason, may I request
>> that a work around for this is implemented?
>>
>
> Did you try to use register an atexit() callback [and a SIGABRT
> handler for the case of abort()]?
>
>
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich2-dev
mailing list