[mpich-discuss] Unknown error code in 1.1.1p1
Scott Atchley
atchley at myri.com
Mon Nov 30 13:23:24 CST 2009
On Nov 30, 2009, at 2:04 PM, Dave Goodell wrote:
> On Nov 30, 2009, at 12:48 PM, Scott Atchley wrote:
>
>> Looking at $MPICH/src/mpi/pt2pt/test.c, line 136 calls
>> MPIU_ERR_POP() if mpi_errno is not 0. It should then fall through
>> to to fn_exit which will return mpi_errno.
>>
>> Instead, it seems to go to fn_fail which calls
>> MPIR_Err_create_code() and then MPIR_Err_return_comm() which
>> eventually calls MPID_Abort(). The latter prints the MXMPI message
>> and exits.
>>
>> Am I missing something? Has anyone seen a similar failure before?
>
> The key piece you are missing is that MPIU_ERR_POP() does a "goto
> fn_fail;" under the hood. The following line:
<snip>
Argh, it was right in front of me in the definition and I did not see
the goto fn_fail. :-)
> An error message like your user is getting usually indicates either
> a programming error inside the library (our fault or possibly
> yours), or memory corruption issues (usually the user's fault).
> Have you tried valgrind on it yet?
Not yet. I am waiting on the dataset so I can try to reproduce here.
Thanks,
Scott
More information about the mpich-discuss
mailing list