[mpich-discuss] -mpe=mpitrace not producing any output, and Valgrind outputs on r3717

François PELLEGRINI francois.pellegrini at labri.fr
Mon Jan 5 04:24:56 CST 2009


Hello and happy new year to all,

Several topics are addressed in this e-mail.

First, I have trouble using the "mpitrace" feature of MPE.
I compile all of my source code with "mpicc -mpe=mpitrace",
and link the objects with "-ltmpe", but absolutely no output
is produced when I run the compiled program. This both happens
with the official 1.0.8 and r3717 packages. What did I do
wrong ?


When using the r3717 package compiled with Valgrind support
and running my program on a Linux 32bit system, I get such
messages when freeing intermediate communicators:

==26273== Invalid read of size 4
==26273==    at 0x8100C25: MPIR_CommL_forget (dbginit.c:317)
==26273==    by 0x80C0FD2: MPIR_Comm_release (commutil.c:1073)
==26273==    by 0x80B96F0: PMPI_Comm_free (comm_free.c:117)
==26273==    by 0x8087673: MPI_Comm_free (trace_mpi_core.c:590)
[...]
==26273==  Address 0x6c16910 is 1,160 bytes inside a block of size 67,740 alloc'd
==26273==    at 0x4022ADE: malloc (vg_replace_malloc.c:207)
==26273==    by 0x8101A80: MPIU_trmalloc (trmem.c:235)
==26273==    by 0x8101F98: MPIU_trcalloc (trmem.c:734)
==26273==    by 0x8102583: MPIU_Handle_obj_alloc_unsafe (handlemem.c:194)
==26273==    by 0x80C1EEC: MPIR_Comm_create (commutil.c:100)
==26273==    by 0x80C244E: MPIR_Comm_commit (commutil.c:300)
==26273==    by 0x80BC794: PMPI_Comm_split (comm_split.c:384)
==26273==    by 0x808725D: MPI_Comm_split (trace_mpi_core.c:718)
[...]

==26271== Invalid read of size 4
==26271==    at 0x8100C25: MPIR_CommL_forget (dbginit.c:317)
==26271==    by 0x80C0FD2: MPIR_Comm_release (commutil.c:1073)
==26271==    by 0x80C0E9C: MPIR_Comm_release (commutil.c:1044)
==26271==    by 0x80B96F0: PMPI_Comm_free (comm_free.c:117)
==26271==    by 0x8087673: MPI_Comm_free (trace_mpi_core.c:590)
[...]
==26271==  Address 0x6bcb128 is 632 bytes inside a block of size 67,740 alloc'd
==26271==    at 0x4022ADE: malloc (vg_replace_malloc.c:207)
==26271==    by 0x8101A80: MPIU_trmalloc (trmem.c:235)
==26271==    by 0x8101F98: MPIU_trcalloc (trmem.c:734)
==26271==    by 0x8102583: MPIU_Handle_obj_alloc_unsafe (handlemem.c:194)
==26271==    by 0x80C1EEC: MPIR_Comm_create (commutil.c:100)
==26271==    by 0x80C33DA: MPIR_Comm_copy (commutil.c:898)
==26271==    by 0x80B916D: PMPI_Comm_dup (comm_dup.c:148)
==26271==    by 0x808774B: MPI_Comm_dup (trace_mpi_core.c:570)
[...]

Also, I get many such messages at completion time:

==25572== Invalid read of size 4
==25572==    at 0x8100FF1: MPIU_trdump (trmem.c:581)
==25572==    by 0x80DBA9A: PMPI_Finalize (finalize.c:275)
==25572==    by 0x8085CEA: MPI_Finalize (trace_mpi_core.c:1265)
==25572==    by 0x804A900: main (dgmap.c:380)
==25572==  Address 0x6bea818 is 112 bytes inside a block of size 180 alloc'd
==25572==    at 0x4022ADE: malloc (vg_replace_malloc.c:207)
==25572==    by 0x8101A80: MPIU_trmalloc (trmem.c:235)
==25572==    by 0x8100D2B: MPIR_Sendq_remember (dbginit.c:244)
==25572==    by 0x80E313F: PMPI_Isend (isend.c:128)
==25572==    by 0x8084A8E: MPI_Isend (trace_mpi_core.c:1770)
[...]

==25920== Invalid read of size 1
==25920==    at 0x410E430: _IO_default_xsputn (in /lib/i686/libc-2.8.so)
==25920==    by 0x40E6037: vfprintf (in /lib/i686/libc-2.8.so)
==25920==    by 0x40E702F: (within /lib/i686/libc-2.8.so)
==25920==    by 0x40E2795: vfprintf (in /lib/i686/libc-2.8.so)
==25920==    by 0x40EC27E: fprintf (in /lib/i686/libc-2.8.so)
==25920==    by 0x8101025: MPIU_trdump (trmem.c:578)
==25920==    by 0x80DBA9A: PMPI_Finalize (finalize.c:275)
==25920==    by 0x8085CEA: MPI_Finalize (trace_mpi_core.c:1265)
==25920==    by 0x804A900: main (dgmap.c:380)
==25920==  Address 0x6b50620 is 64 bytes inside a block of size 180 alloc'd
==25920==    at 0x4022ADE: malloc (vg_replace_malloc.c:207)
==25920==    by 0x8101A80: MPIU_trmalloc (trmem.c:235)
==25920==    by 0x8100D2B: MPIR_Sendq_remember (dbginit.c:244)
==25920==    by 0x80E313F: PMPI_Isend (isend.c:128)
==25920==    by 0x8084A8E: MPI_Isend (trace_mpi_core.c:1770)
[...]

==26271== Invalid read of size 4
==26271==    at 0x8100C25: MPIR_CommL_forget (dbginit.c:317)
==26271==    by 0x80C0FD2: MPIR_Comm_release (commutil.c:1073)
==26271==    by 0x80C0E9C: MPIR_Comm_release (commutil.c:1044)
==26271==    by 0x8133689: MPID_Finalize (mpid_finalize.c:103)
==26271==    by 0x80DB950: PMPI_Finalize (finalize.c:205)
==26271==    by 0x8085CEA: MPI_Finalize (trace_mpi_core.c:1265)
==26271==    by 0x804A900: main (dgmap.c:380)
==26271==  Address 0x81b87f8 is not stack'd, malloc'd or (recently) free'd

==26273== Invalid read of size 1
==26273==    at 0x410E43C: _IO_default_xsputn (in /lib/i686/libc-2.8.so)
==26273==    by 0x40E6037: vfprintf (in /lib/i686/libc-2.8.so)
==26273==    by 0x40E702F: (within /lib/i686/libc-2.8.so)
==26273==    by 0x40E2795: vfprintf (in /lib/i686/libc-2.8.so)
==26273==    by 0x40EC27E: fprintf (in /lib/i686/libc-2.8.so)
==26273==    by 0x8101025: MPIU_trdump (trmem.c:578)
==26273==    by 0x80DBA9A: PMPI_Finalize (finalize.c:275)
==26273==    by 0x8085CEA: MPI_Finalize (trace_mpi_core.c:1265)
==26273==    by 0x804A900: main (dgmap.c:380)
==26273==  Address 0x45d6dda is 66 bytes inside a block of size 180 alloc'd
==26273==    at 0x4022ADE: malloc (vg_replace_malloc.c:207)
==26273==    by 0x8101A80: MPIU_trmalloc (trmem.c:235)
==26273==    by 0x8100D2B: MPIR_Sendq_remember (dbginit.c:244)
==26273==    by 0x80E313F: PMPI_Isend (isend.c:128)
==26273==    by 0x8084A8E: MPI_Isend (trace_mpi_core.c:1770)
[...]

I tend to think that all of my communications are matched, but
these error messages puzzle me. By the way, is there a simple way
to have MPIch display the list of unmatched communications when it
releases a communicator ?


Finally, still in the r3717, there seem to be many bogus
Valgrind false positive messages, such as:

==25570== Conditional jump or move depends on uninitialised value(s)
==25570==    at 0x8134E41: MPID_Irecv (mpid_irecv.c:83)
==25570==    by 0x80B054E: MPIC_Sendrecv (helper_fns.c:117)
==25570==    by 0x8091244: MPIR_Barrier (barrier.c:75)
==25570==    by 0x80912D7: MPIR_Barrier_or_coll_fn (barrier.c:242)
==25570==    by 0x8091D70: PMPI_Barrier (barrier.c:419)
==25570==    by 0x8088614: MPI_Barrier (trace_mpi_core.c:182)
==25570==    by 0x8059304: dgraphLoad (dgraph_io_load.c:88)
==25570==    by 0x804D830: SCOTCH_dgraphLoad (library_dgraph_io_load.c:100)
==25570==    by 0x804A455: main (dgmap.c:285)

==26271== Conditional jump or move depends on uninitialised value(s)
==26271==    at 0x8134E41: MPID_Irecv (mpid_irecv.c:83)
==26271==    by 0x80B054E: MPIC_Sendrecv (helper_fns.c:117)
==26271==    by 0x80AA36D: MPIR_Allgatherv (allgatherv.c:212)
==26271==    by 0x80ABB08: PMPI_Allgatherv (allgatherv.c:1001)
==26271==    by 0x8088A19: MPI_Allgatherv (trace_mpi_core.c:80)
[...]

More intriguing is this one:

==26271== Conditional jump or move depends on uninitialised value(s)
==26271==    at 0x8134E41: MPID_Irecv (mpid_irecv.c:83)
==26271==    by 0x80E17E4: PMPI_Irecv (irecv.c:125)
==26271==    by 0x8084CEE: MPI_Irecv (trace_mpi_core.c:1713)
[...]

Well, that's all for today.   :-)

Thanks,


					f.p.



More information about the mpich-discuss mailing list