[mpich-discuss] Understanding warning/error message ?
Dave Goodell
goodell at mcs.anl.gov
Mon Nov 24 08:54:48 CST 2008
On Nov 24, 2008, at 8:33 AM, François PELLEGRINI wrote:
> I sometimes have crashes for large number of processes, in
> MPI_Waitall calls. I am tracking them to know whether they
> come from my side (most likely), but I also wonder on some
> messages, in particular such as the ones like :
> "[1] 24 at [0x08e1ece8], mpid_vc.c[62]"
> that I sometimes get.
>
> What do they mean ?
> Can I get more info on their cause ?
The messages mentioning an address and a source location are being
emitted because you configured your mpich2 installation with --enable-
g=all or --enable-g=mem. The messages indicate places where the
implementation has detected memory leaks.
Some terse information about this feature can be found here: http://
wiki.mcs.anl.gov/mpich2/index.php/
Support_for_Debugging_Memory_Allocation
Generally speaking, those messages are only useful if you are
actually developing the mpich2 library. Because the implementation
only tracks memory allocated within the mpich2 library, tools like
valgrind are a better way to find leaks in user code.
Good luck finding your MPI_Waitall crash. If you can distill your
program down to a a very small example program that will elicit the
behavior, feel free to send it to us at mpich2-maint@ or mpich2-
discuss at . Also, configuring mpich2 with --enable-error-checking=all
might help catch invalid arguments to MPI functions.
-Dave
More information about the mpich-discuss
mailing list