[petsc-users] tons of valgrind errors with simply PetscInitialize and PetscFinalize?

Satish Balay balay at mcs.anl.gov
Wed Jan 22 19:36:31 CST 2014


Or use --download-f-blas-lapack.

For debugging [with valgrind] - its best to use system blas or
--download-f-blas-lapack with --download-mpich [with the default --with-debugging=1]

And use a different PETSC_ARCH for this debug build - so that you can
use different higher performing blas/mpi/other stuff for performance
runs.

Satish

On Wed, 22 Jan 2014, Matthew Knepley wrote:

> On Wed, Jan 22, 2014 at 7:22 PM, David Liu <daveliu at mit.edu> wrote:
> 
> > Okay, I reinstalled Petsc with Mpich, and the list of errors is a lot
> > shorter: It looks like I already have openblas here too. Is this the best I
> > can get?
> >
> 
> The common solution is to put these in a valgrind suppressions file.
> 
>    Matt
> 
> 
> > ==22666== Memcheck, a memory error detector
> >
> > ==22666== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
> >
> > ==22666== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> >
> > ==22666== Command: ./run
> >
> > ==22666==
> >
> > ==22666== Conditional jump or move depends on uninitialised value(s)
> >
> > ==22666==    at 0xAE774AF: ____strtoul_l_internal (strtol_l.c:438)
> >
> > ==22666==    by 0x7442EE2: gotoblas_affinity_init (in
> > /usr/lib/openblas-base/libopenblas.so.0)
> >
> > ==22666==    by 0x711811A: gotoblas_init (in
> > /usr/lib/openblas-base/libopenblas.so.0)
> >
> > ==22666==    by 0x400DF7F: call_init (dl-init.c:85)
> >
> > ==22666==    by 0x400E076: _dl_init (dl-init.c:134)
> >
> > ==22666==    by 0x4000B29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
> >
> > ==22666==
> >
> > ==22666== Conditional jump or move depends on uninitialised value(s)
> >
> > ==22666==    at 0xAE77427: ____strtoul_l_internal (strtol_l.c:442)
> >
> > ==22666==    by 0x7442EE2: gotoblas_affinity_init (in
> > /usr/lib/openblas-base/libopenblas.so.0)
> >
> > ==22666==    by 0x711811A: gotoblas_init (in
> > /usr/lib/openblas-base/libopenblas.so.0)
> >
> > ==22666==    by 0x400DF7F: call_init (dl-init.c:85)
> >
> > ==22666==    by 0x400E076: _dl_init (dl-init.c:134)
> >
> > ==22666==    by 0x4000B29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
> >
> > ==22666==
> >
> > ==22666== Use of uninitialised value of size 8
> >
> > ==22666==    at 0xAE77465: ____strtoul_l_internal (strtol_l.c:466)
> >
> > ==22666==    by 0x7442EE2: gotoblas_affinity_init (in
> > /usr/lib/openblas-base/libopenblas.so.0)
> >
> > ==22666==    by 0x711811A: gotoblas_init (in
> > /usr/lib/openblas-base/libopenblas.so.0)
> >
> > ==22666==    by 0x400DF7F: call_init (dl-init.c:85)
> >
> > ==22666==    by 0x400E076: _dl_init (dl-init.c:134)
> >
> > ==22666==    by 0x4000B29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
> >
> > ==22666==
> >
> > ==22666==
> >
> > ==22666== HEAP SUMMARY:
> >
> > ==22666==     in use at exit: 0 bytes in 0 blocks
> >
> > ==22666==   total heap usage: 222 allocs, 222 frees, 117,046 bytes
> > allocated
> >
> > ==22666==
> >
> > ==22666== All heap blocks were freed -- no leaks are possible
> >
> > ==22666==
> >
> > ==22666== For counts of detected and suppressed errors, rerun with: -v
> >
> > ==22666== Use --track-origins=yes to see where uninitialised values come
> > from
> >
> > ==22666== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 4 from 4)
> >
> >
> > On Wed, Jan 22, 2014 at 6:57 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> >> David Liu <daveliu at mit.edu> writes:
> >>
> >> > sure thing. Here's what I get when I directly run the executable (no
> >> MPI if
> >> > I understand correctly).
> >>
> >> No, you are linked to Open MPI and it is very noisy under Valgrind.  Use
> >> MPICH if you want something tight.  I don't know if the gotoblas noise
> >> has been fixed, but that project has evolved into OpenBLAS, which you
> >> may as well use since it is the maintained code base.
> >>
> >> http://www.openblas.net/
> >>
> >> > If I do "mpirun -n 1 valgrind ./run", I get the exact same thing.
> >> >
> >> > ==29900== Memcheck, a memory error detector
> >> >
> >> > ==29900== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et
> >> al.
> >> >
> >> > ==29900== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright
> >> info
> >> >
> >> > ==29900== Command: ./run
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Conditional jump or move depends on uninitialised value(s)
> >> >
> >> > ==29900==    at 0xACE64AF: ____strtoul_l_internal (strtol_l.c:438)
> >> >
> >> > ==29900==    by 0x71C2EE2: gotoblas_affinity_init (in
> >> > /usr/lib/openblas-base/libopenblas.so.0)
> >> >
> >> > ==29900==    by 0x6E9811A: gotoblas_init (in
> >> > /usr/lib/openblas-base/libopenblas.so.0)
> >> >
> >> > ==29900==    by 0x400DF7F: call_init (dl-init.c:85)
> >> >
> >> > ==29900==    by 0x400E076: _dl_init (dl-init.c:134)
> >> >
> >> > ==29900==    by 0x4000B29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Conditional jump or move depends on uninitialised value(s)
> >> >
> >> > ==29900==    at 0xACE6427: ____strtoul_l_internal (strtol_l.c:442)
> >> >
> >> > ==29900==    by 0x71C2EE2: gotoblas_affinity_init (in
> >> > /usr/lib/openblas-base/libopenblas.so.0)
> >> >
> >> > ==29900==    by 0x6E9811A: gotoblas_init (in
> >> > /usr/lib/openblas-base/libopenblas.so.0)
> >> >
> >> > ==29900==    by 0x400DF7F: call_init (dl-init.c:85)
> >> >
> >> > ==29900==    by 0x400E076: _dl_init (dl-init.c:134)
> >> >
> >> > ==29900==    by 0x4000B29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Use of uninitialised value of size 8
> >> >
> >> > ==29900==    at 0xACE6465: ____strtoul_l_internal (strtol_l.c:466)
> >> >
> >> > ==29900==    by 0x71C2EE2: gotoblas_affinity_init (in
> >> > /usr/lib/openblas-base/libopenblas.so.0)
> >> >
> >> > ==29900==    by 0x6E9811A: gotoblas_init (in
> >> > /usr/lib/openblas-base/libopenblas.so.0)
> >> >
> >> > ==29900==    by 0x400DF7F: call_init (dl-init.c:85)
> >> >
> >> > ==29900==    by 0x400E076: _dl_init (dl-init.c:134)
> >> >
> >> > ==29900==    by 0x4000B29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Invalid read of size 8
> >> >
> >> > ==29900==    at 0xAD378CD: _wordcopy_fwd_dest_aligned (wordcopy.c:205)
> >> >
> >> > ==29900==    by 0xAD3156E: __GI_memmove (memmove.c:76)
> >> >
> >> > ==29900==    by 0xAD38B7B: argz_insert (argz-insert.c:55)
> >> >
> >> > ==29900==    by 0xA4405E5: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA4407FF: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA43FFC8: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA440F57: lt_dlforeachfile (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA447FCE: mca_base_component_find (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA448AC1: mca_base_components_open (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA463C44: opal_paffinity_base_open (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA439AD2: opal_init (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA1E6B3E: orte_init (in
> >> > /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
> >> >
> >> > ==29900==  Address 0xc15a998 is 40 bytes inside a block of size 47
> >> alloc'd
> >> >
> >> > ==29900==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
> >> >
> >> > ==29900==    by 0xA43F658: lt__malloc (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA44078E: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA43FFC8: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA440F57: lt_dlforeachfile (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA447FCE: mca_base_component_find (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA448AC1: mca_base_components_open (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA463C44: opal_paffinity_base_open (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA439AD2: opal_init (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA1E6B3E: orte_init (in
> >> > /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
> >> >
> >> > ==29900==    by 0x9F5E373: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x9F7F20D: PMPI_Init_thread (in
> >> > /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Syscall param sched_setaffinity(mask) points to unaddressable
> >> > byte(s)
> >> >
> >> > ==29900==    at 0xAD852F9: syscall (syscall.S:39)
> >> >
> >> > ==29900==    by 0xFD75621: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_paffinity_linux.so)
> >> >
> >> > ==29900==    by 0xFD75A3C: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_paffinity_linux.so)
> >> >
> >> > ==29900==    by 0xFD76599: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_paffinity_linux.so)
> >> >
> >> > ==29900==    by 0xFD754AC: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_paffinity_linux.so)
> >> >
> >> > ==29900==    by 0xA463AEA: opal_paffinity_base_select (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA439B0D: opal_init (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA1E6B3E: orte_init (in
> >> > /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
> >> >
> >> > ==29900==    by 0x9F5E373: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x9F7F20D: PMPI_Init_thread (in
> >> > /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x4F95F49: PetscInitialize (pinit.c:675)
> >> >
> >> > ==29900==    by 0x400CC6: main (prog.c:5)
> >> >
> >> > ==29900==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Conditional jump or move depends on uninitialised value(s)
> >> >
> >> > ==29900==    at 0x9F5E578: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x9F7F20D: PMPI_Init_thread (in
> >> > /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x4F95F49: PetscInitialize (pinit.c:675)
> >> >
> >> > ==29900==    by 0x400CC6: main (prog.c:5)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Conditional jump or move depends on uninitialised value(s)
> >> >
> >> > ==29900==    at 0x9F5E57C: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x9F7F20D: PMPI_Init_thread (in
> >> > /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x4F95F49: PetscInitialize (pinit.c:675)
> >> >
> >> > ==29900==    by 0x400CC6: main (prog.c:5)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== Syscall param writev(vector[...]) points to uninitialised
> >> byte(s)
> >> >
> >> > ==29900==    at 0xAD81BE7: writev (writev.c:56)
> >> >
> >> > ==29900==    by 0x11397E22: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
> >> >
> >> > ==29900==    by 0x11398C5C: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
> >> >
> >> > ==29900==    by 0x1139C2EB: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
> >> >
> >> > ==29900==    by 0x1118E7BD: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
> >> >
> >> > ==29900==    by 0x1118EDB8: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
> >> >
> >> > ==29900==    by 0x10D85AD8: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
> >> >
> >> > ==29900==    by 0x10D854DE: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
> >> >
> >> > ==29900==    by 0x9F5EBAE: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x9F7F20D: PMPI_Init_thread (in
> >> > /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x4F95F49: PetscInitialize (pinit.c:675)
> >> >
> >> > ==29900==    by 0x400CC6: main (prog.c:5)
> >> >
> >> > ==29900==  Address 0x1815faf7 is 87 bytes inside a block of size 256
> >> alloc'd
> >> >
> >> > ==29900==    at 0x4C28CCE: realloc (vg_replace_malloc.c:632)
> >> >
> >> > ==29900==    by 0xA43ADB7: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0xA43B8CD: ??? (in
> >> > /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
> >> >
> >> > ==29900==    by 0x10D85AAC: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
> >> >
> >> > ==29900==    by 0x10D854DE: ??? (in
> >> > /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
> >> >
> >> > ==29900==    by 0x9F5EBAE: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x9F7F20D: PMPI_Init_thread (in
> >> > /usr/lib/openmpi/lib/libmpi.so.0.0.4)
> >> >
> >> > ==29900==    by 0x4F95F49: PetscInitialize (pinit.c:675)
> >> >
> >> > ==29900==    by 0x400CC6: main (prog.c:5)
> >> >
> >> > ==29900==
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== HEAP SUMMARY:
> >> >
> >> > ==29900==     in use at exit: 258,198 bytes in 2,787 blocks
> >> >
> >> > ==29900==   total heap usage: 11,718 allocs, 8,931 frees, 17,060,981
> >> bytes
> >> > allocated
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== LEAK SUMMARY:
> >> >
> >> > ==29900==    definitely lost: 5,956 bytes in 55 blocks
> >> >
> >> > ==29900==    indirectly lost: 3,722 bytes in 22 blocks
> >> >
> >> > ==29900==      possibly lost: 0 bytes in 0 blocks
> >> >
> >> > ==29900==    still reachable: 248,520 bytes in 2,710 blocks
> >> >
> >> > ==29900==         suppressed: 0 bytes in 0 blocks
> >> >
> >> > ==29900== Rerun with --leak-check=full to see details of leaked memory
> >> >
> >> > ==29900==
> >> >
> >> > ==29900== For counts of detected and suppressed errors, rerun with: -v
> >> >
> >> > ==29900== Use --track-origins=yes to see where uninitialised values come
> >> > from
> >> >
> >> > ==29900== ERROR SUMMARY: 619 errors from 8 contexts (suppressed: 4 from
> >> 4)
> >> >
> >> >
> >> > On Wed, Jan 22, 2014 at 6:40 PM, Jed Brown <jed at jedbrown.org> wrote:
> >> >
> >> >> David Liu <daveliu at mit.edu> writes:
> >> >>
> >> >> > Hi, I'm running a very simple code, consisting of just PetscFinalize
> >> and
> >> >> > PetscInitialize and nothing else.
> >> >> >
> >> >> > I'm running it with valgrind using the command
> >> >> > "valgrind ./run"
> >> >> >
> >> >> > I also tried (as the petsc homepage suggests)
> >> >> > ${PETSC_DIR}/bin/petscmpiexec -valgrind -n 1 ./run -malloc off
> >> >> >
> >> >> > For both cases, I get tons of error messages like
> >> >> > "Conditional jump or move depends on uninitialised value(s)"
> >> >> > "Use of uninitialised value of size 8"
> >> >> > "Invalid read of size 8"
> >> >> > "Address 0xc15a998 is 40 bytes inside a block of size 47 alloc'd"
> >> >>
> >> >> You have to send the *exact and complete* output, not snippets.  Are
> >> you
> >> >> using Open MPI?  Chances are this is in the stack somewhere because we
> >> >> regularly test PETSc itself using valgrind.
> >> >>
> >>
> >
> >
> 
> 
> 



More information about the petsc-users mailing list