[petsc-dev] valgrind now a point of failure?

Satish Balay balay at mcs.anl.gov
Tue Feb 23 11:47:50 CST 2021


It took a long time to run - but valgrind gave no errors.

balay at petsc-02:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck  ./ex238 -mat_block_size 15
==8099== Memcheck, a memory error detector
==8099== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8099== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==8099== Command: ./ex238 -mat_block_size 15
==8099== 
==8099== Warning: set address range perms: large range [0x59e43040, 0xeabb0bc0) (undefined)
==8099== Warning: set address range perms: large range [0x59e43028, 0xeabb0bd8) (noaccess)
==8099== 
==8099== HEAP SUMMARY:
==8099==     in use at exit: 121 bytes in 2 blocks
==8099==   total heap usage: 535 allocs, 533 frees, 2,457,025,714 bytes allocated
==8099== 
==8099== LEAK SUMMARY:
==8099==    definitely lost: 0 bytes in 0 blocks
==8099==    indirectly lost: 0 bytes in 0 blocks
==8099==      possibly lost: 0 bytes in 0 blocks
==8099==    still reachable: 121 bytes in 2 blocks
==8099==         suppressed: 0 bytes in 0 blocks
==8099== Rerun with --leak-check=full to see details of leaked memory
==8099== 
==8099== For counts of detected and suppressed errors, rerun with: -v
==8099== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


Satish

On Tue, 23 Feb 2021, Barry Smith wrote:

> 
>   Satish,
> 
>     Thanks for running this, but it is the 15 that is breaking, not the 12 :-). It is crashing inside building the matrix on Solaris with memory corruption. But I am having trouble getting it to  cause problems elsewhere.
> 
>   Barry
> 
>   I think it is just code what was not previously properly tested in the nightly builds, the code has been around for a while. Or could be a bug in my test program.
> 
> 
> 
> 
> 
> > On Feb 22, 2021, at 10:31 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > I get the following with a debug build.
> > 
> >>>>>>>>> 
> > balay at petsc-02:/scratch/balay/petsc/src/mat/tests$ make ex238
> > gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3    -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-debug/include     ex238.c  -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib -L/scratch/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lpetsc -llapack -lblas -lpthread -lm -lX11 -lstdc++ -ldl -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex238
> > balay at petsc-02:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck  ./ex238 -mat_block_size 12
> > ==34355== Memcheck, a memory error detector
> > ==34355== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> > ==34355== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> > ==34355== Command: ./ex238 -mat_block_size 12
> > ==34355== 
> > ==34355== Warning: set address range perms: large range [0x59e43040, 0xb696a840) (undefined)
> > <<<<<<<<
> > 
> > Hang? takes a long time.  try a different example
> > 
> >>>>>>> 
> > balay at petsc-02:/scratch/balay/petsc/src/mat/tests$ make ex237
> > gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3    -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-debug/include     ex237.c  -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib -L/scratch/balay/petsc/arch-linux-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lpetsc -llapack -lblas -lpthread -lm -lX11 -lstdc++ -ldl -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex237
> > balay at petsc-02:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck -q ./ex237 -f /scratch/balay/petsc/share/petsc/datafiles/matrices/spd-real-int32-float64
> > Benchmarking MatMult: with A seqaij 12x12
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x2
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x4
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x8
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x16
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x32
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x64
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x128
> > balay at petsc-02:/scratch/balay/petsc/src/mat/tests$ 
> > 
> > <<<<<<<<<
> > 
> > So the likely issue is - this opt build with '-march=native' [perhaps this valgrind version is older than the cpu].
> > 
> > Ok try an optimized build on an older CPU - aka  es [@gce]
> > 
> >>>>>> 
> > 
> > balay at es:/scratch/balay/petsc/src/mat/tests$ make ex237
> > gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -march=native -O3  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -march=native -O3    -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-linux-c-opt/include     ex237.c  -Wl,-rpath,/scratch/balay/petsc/arch-linux-c-opt/lib -L/scratch/balay/petsc/arch-linux-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lpetsc -llapack -lblas -lpthread -lm -lX11 -lstdc++ -ldl -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -o ex237
> > balay at es:/scratch/balay/petsc/src/mat/tests$ valgrind --tool=memcheck -q ./ex237 -f /scratch/balay/petsc/share/petsc/datafiles/matrices/spd-real-int32-float64
> > Benchmarking MatMult: with A seqaij 12x12
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x2
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x4
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x8
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x16
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x32
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x64
> > Benchmarking MatProduct AB: with A seqaij 12x12 and B seqdense 12x128
> > balay at es:/scratch/balay/petsc/src/mat/tests$ 
> > 
> > <<<<<<<
> > 
> > Satish
> > 
> > 
> > 
> > On Mon, 22 Feb 2021, Barry Smith wrote:
> > 
> >> 
> >>  I knew they hate Macs but now Linux? Any trustworthy machines to run valgrind?
> >> 
> >> 
> >> $ petscmpiexec -valgrind -n 1 ./ex238 -mat_block_size 12
> >> ==14144== 
> >> ==14144== Process terminating with default action of signal 4 (SIGILL)
> >> ==14144==  Illegal opcode at address 0x4F808A9
> >> ==14144==    at 0x4F808A9: PetscSetDisplay (in /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4)
> >> ==14144==    by 0x4F086BD: PetscOptionsCheckInitial_Private (in /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4)
> >> ==14144==    by 0x4F0D5BC: PetscInitialize (in /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4)
> >> ==14144==    by 0x108D0E: main (in /scratch/bsmith/petsc/src/mat/tests/ex238)
> >> Illegal instruction (core dumped)
> >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) arch-add-baij-12
> >> $ echo $PETSC_OPTIONS
> >> 
> >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) arch-add-baij-12
> >> $ hostname 
> >> petsc-02
> >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) arch-add-baij-12
> >> $ uname -a
> >> Linux petsc-02 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) arch-add-baij-12
> >> $ which valgrind
> >> /usr/bin/valgrind
> >> 
> >> $ make ex237
> >> gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -march=native -O3  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -march=native -O3    -I/scratch/bsmith/petsc/include -I/scratch/bsmith/petsc/arch-add-baij-12/include     ex237.c  -Wl,-rpath,/scratch/bsmith/petsc/arch-add-baij-12/lib -L/scratch/bsmith/petsc/arch-add-baij-12/lib -lpetsc -llapack -lblas -lpthread -lm -lX11 -lquadmath -ldl -o ex237
> >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) arch-add-baij-12
> >> $ petscmpiexec -valgrind -n 1 ./ex237
> >> ==14841== 
> >> ==14841== Process terminating with default action of signal 4 (SIGILL)
> >> ==14841==  Illegal opcode at address 0x4F808A9
> >> ==14841==    at 0x4F808A9: PetscSetDisplay (in /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4)
> >> ==14841==    by 0x4F086BD: PetscOptionsCheckInitial_Private (in /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4)
> >> ==14841==    by 0x4F0D5BC: PetscInitialize (in /scratch/bsmith/petsc/arch-add-baij-12/lib/libpetsc.so.3.014.4)
> >> ==14841==    by 0x109DE0: main (in /scratch/bsmith/petsc/src/mat/tests/ex237)
> >> Illegal instruction (core dumped)
> >> /scratch/bsmith/petsc/src/mat/tests (barry/2021-02-12/add-baij-12=) arch-add-baij-12
> >> 
> >> 
> > 
> 



More information about the petsc-dev mailing list