[mpich-discuss] Understanding warning/error message ?

Dave Goodell goodell at mcs.anl.gov
Mon Nov 24 08:54:48 CST 2008


On Nov 24, 2008, at 8:33 AM, François PELLEGRINI wrote:

> I sometimes have crashes for large number of processes, in
> MPI_Waitall calls. I am tracking them to know whether they
> come from my side (most likely), but I also wonder on some
> messages, in particular such as the ones like :
> "[1] 24 at [0x08e1ece8], mpid_vc.c[62]"
> that I sometimes get.
>
> What do they mean ?
> Can I get more info on their cause ?

The messages mentioning an address and a source location are being  
emitted because you configured your mpich2 installation with --enable- 
g=all or --enable-g=mem.  The messages indicate places where the  
implementation has detected memory leaks.

Some terse information about this feature can be found here: http:// 
wiki.mcs.anl.gov/mpich2/index.php/ 
Support_for_Debugging_Memory_Allocation

Generally speaking, those messages are only useful if you are  
actually developing the mpich2 library.  Because the implementation  
only tracks memory allocated within the mpich2 library, tools like  
valgrind are a better way to find leaks in user code.

Good luck finding your MPI_Waitall crash.  If you can distill your  
program down to a a very small example program that will elicit the  
behavior, feel free to send it to us at mpich2-maint@ or mpich2- 
discuss at .  Also, configuring mpich2 with --enable-error-checking=all  
might help catch invalid arguments to MPI functions.

-Dave




More information about the mpich-discuss mailing list