[mpich-discuss] Unknown error code in 1.1.1p1
Dave Goodell
goodell at mcs.anl.gov
Mon Nov 30 13:04:24 CST 2009
On Nov 30, 2009, at 12:48 PM, Scott Atchley wrote:
> Looking at $MPICH/src/mpi/pt2pt/test.c, line 136 calls
> MPIU_ERR_POP() if mpi_errno is not 0. It should then fall through to
> to fn_exit which will return mpi_errno.
>
> Instead, it seems to go to fn_fail which calls
> MPIR_Err_create_code() and then MPIR_Err_return_comm() which
> eventually calls MPID_Abort(). The latter prints the MXMPI message
> and exits.
>
> Am I missing something? Has anyone seen a similar failure before?
The key piece you are missing is that MPIU_ERR_POP() does a "goto
fn_fail;" under the hood. The following line:
-------8<--------
#undef FUNCNAME
#define FUNCNAME testfunc
#undef FCNAME
#define FCNAME MPIU_QUOTE(FUNCNAME)
if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-------8<--------
becomes:
-------8<--------
if (mpi_errno) {mpi_errno = MPIR_Err_create_code( mpi_errno,
0,"testfunc", 17, 15, "**fail", 0 ); goto fn_fail ;};
-------8<--------
(those braces should be a do{}while(0), but I'll fix that later)
MPIU_ERR_POP is basically just used to add the current stack frame to
the error code created by a lower level and then return, with a quick
trip through the fn_fail stanza to perform any necessary cleanup.
This is very weakly documented here: http://wiki.mcs.anl.gov/mpich2/index.php/Reporting_And_Returning_Error_Codes
An error message like your user is getting usually indicates either a
programming error inside the library (our fault or possibly yours), or
memory corruption issues (usually the user's fault). Have you tried
valgrind on it yet?
-Dave
More information about the mpich-discuss
mailing list