[MPICH2-dev] handling fatal errors

David Gingold david.gingold at sicortex.com
Fri Aug 5 15:51:57 CDT 2005


In an MPICH2 device implementation, what is the right way to handle  
fatal errors that cannot easily be attributed to a calling function?

Possible examples of this:

     - An asynchronous progress thread attempts to allocate memory  
but fails.

     - Resource allocation fails in code that was triggered by a user  
MPI call, but that is not particularly related to that call.

     - A similar failure happens in a place where it would be too  
awkward or costly to include code to pass the error back to the user.

I spotted a few examples of this sort of thing in the MPICH2 code:

     MPID_Abort(MPIR_Process.comm_world, MPIR_Err_create_code 
(...), ...);

but I'm not sure whether doing this crosses into the realm of  
undesirability.

-dg

--
David Gingold
Principal Software Engineer
SiCortex
One Clock Tower Place, Suite 100
Maynard MA 01754
(978) 897-0214 x224





More information about the mpich2-dev mailing list