[mpich-discuss] Unknown error code in 1.1.1p1

Dave Goodell goodell at mcs.anl.gov
Mon Nov 30 13:04:24 CST 2009


On Nov 30, 2009, at 12:48 PM, Scott Atchley wrote:

> Looking at $MPICH/src/mpi/pt2pt/test.c, line 136 calls  
> MPIU_ERR_POP() if mpi_errno is not 0. It should then fall through to  
> to fn_exit which will return mpi_errno.
>
> Instead, it seems to go to fn_fail which calls  
> MPIR_Err_create_code() and then MPIR_Err_return_comm() which  
> eventually calls MPID_Abort(). The latter prints the MXMPI message  
> and exits.
>
> Am I missing something? Has anyone seen a similar failure before?

The key piece you are missing is that MPIU_ERR_POP() does a "goto  
fn_fail;" under the hood.  The following line:

-------8<--------
#undef FUNCNAME
#define FUNCNAME testfunc
#undef FCNAME
#define FCNAME MPIU_QUOTE(FUNCNAME)
if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-------8<--------

becomes:

-------8<--------
if (mpi_errno) {mpi_errno = MPIR_Err_create_code( mpi_errno, 
0,"testfunc", 17, 15, "**fail", 0 ); goto fn_fail ;};
-------8<--------

(those braces should be a do{}while(0), but I'll fix that later)

MPIU_ERR_POP is basically just used to add the current stack frame to  
the error code created by a lower level and then return, with a quick  
trip through the fn_fail stanza to perform any necessary cleanup.   
This is very weakly documented here: http://wiki.mcs.anl.gov/mpich2/index.php/Reporting_And_Returning_Error_Codes

An error message like your user is getting usually indicates either a  
programming error inside the library (our fault or possibly yours), or  
memory corruption issues (usually the user's fault).  Have you tried  
valgrind on it yet?

-Dave



More information about the mpich-discuss mailing list