[mpich2-dev] Parent terminates when the spawned child terminates

Suraj Prabhakaran suraj.prabhakaran at gmail.com
Wed Dec 15 16:11:55 CST 2010


Thank you guys for all the reply. Thanks Pavan for hinting at the auto 
cleanup option. This does what I wanted it to do. Just adding to the 
discussion, I agree with your comments on the portable program's 
assumption part. The standard states that one program's error may affect 
the other only if they somehow belong in an intra or inter communicator. 
And in my case, the parent spawns the child and disconnects the 
communicator. The child also as soon as init, gets the parent and 
disconnects from the parent. The sample code is given below.

Parent:

int main (int argc, char *argv[])
{
    MPI_Init(&argc, &argv);
    MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
    MPI_Comm child_comm;
    MPI_Comm_spawn("./child", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, 
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
    MPI_Errhandler_set(child_comm, MPI_ERRORS_RETURN);
    printf("spawned a child\n");
    MPI_Comm_disconnect(&child_comm);
    printf("Disconnected from the child\n");
    sleep(5000);
    MPI_Finalize();
    return 0;
}

Child:

int main (int argc, char *argv[])
{
    MPI_Init(&argc, &argv);
    MPI_Comm parent, parent1;
    MPI_Comm_get_parent(&parent);
    MPI_Comm_disconnect(&parent);
    if(parent == MPI_COMM_NULL)
    printf("Child: Disconnected from the parent, Exiting\n\n");

    MPI_Comm_get_parent(&parent1);

    if(parent1 != MPI_COMM_NULL)
    printf("Child: yes, i got my parent again\n");

    exit(1);

    MPI_Finalize();
    return 0;
}

If you see here, the child's first message will be displayed while the 
second printf will not be displayed! Which shows that it cant get the 
parent anymore. And when a exit() happens abruptly, it shouldn't affect 
the parent rite (since they now dont have a common communicator, and 
without any auto clean option) ?
Please correct me if I am wrong.

Thanks,
Suraj

On 12/15/2010 05:47 PM, Pavan Balaji wrote:
>
> On 12/15/2010 10:42 AM, Lisandro Dalcin wrote:
>> On 15 December 2010 13:34, Pavan Balaji<balaji at mcs.anl.gov>  wrote:
>>>
>>> The standard leaves it pretty open, saying that an error in one 
>>> application
>>> *may* affect the other. What exactly is done depends on the MPI
>>> implementation.
>>>
>>
>> Of course, but I understand that a portable program should assume that
>> errors in one app *do* affect the other... and then be very careful
>> with usages of exit(), abort() and friends.
>
> Agreed. At least till we get to MPI-3 :-).
>
>  -- Pavan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20101215/eabfa605/attachment.htm>


More information about the mpich2-dev mailing list