[mpich-discuss] MPI errors with more than one communicator

Rajeev Thakur thakur at mcs.anl.gov
Fri Jan 9 15:03:00 CST 2009


If you call MPI_Comm_disconnect on the intercommunicator prior to the node
failure, it might work (not sure). In any case, in the coming year, we plan
to make MPICH2 more resilient to such failures.
 
Rajeev


  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Federico Golfrè
Andreasi
Sent: Thursday, January 08, 2009 7:54 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] MPI errors with more than one communicator


The architecture is made up of 2 MPI programs (one is executed by the other
via MPI_Comm_spawn_multiple) and so two different communicator.
I noticed that if an error occurs (i.e. a node crash)  only the tasks
related to that communicator (an intra-communicator) are aborted, but the
tasks in the other communicator still wait for messages that will never
arrive.
How can I close all the tasks related to both the communicator when an error
occurs in one communicator?

Thank you,
Federico 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090109/882ab854/attachment.htm>


More information about the mpich-discuss mailing list