[petsc-users] Valgrind unhandled instruction

Satish Balay balay at mcs.anl.gov
Wed May 21 17:27:53 CDT 2014


Looks like valgrind-3.7 doesn't know all instructions generated by
"-O3 -march=native".

And generally one should run valgrind with code compiled with '-g'
anyway.

I see similar issue with valgrind-3.7 - but the error goes away with
valgrind-3.9 [compiled from source]

Satish

-----------

balay at es^/scratch/balay/petsc/src/ksp/ksp/examples/tutorials(master) $ valgrind --version
valgrind-3.7.0
balay at es^/scratch/balay/petsc/src/ksp/ksp/examples/tutorials(master) $ valgrind --tool=memcheck -q  ./ex56 -ne 9 -alpha 1.e-3 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -two_solves -ksp_monitor_short -use_mat_nearnullspace
vex amd64->IR: unhandled instruction bytes: 0xC5 0xFB 0x2A 0xC2 0xBA 0x1 0x0 0x0
==19041== valgrind: Unrecognised instruction at address 0x4ef760e.
==19041==    at 0x4EF760E: PetscSetDisplay (in /scratch/balay/petsc/arch-test/lib/libpetsc.so.3.04.4)
==19041==    by 0x4F4BB1D: PetscOptionsCheckInitial_Private (in /scratch/balay/petsc/arch-test/lib/libpetsc.so.3.04.4)
==19041==    by 0x4F51996: PetscInitialize (in /scratch/balay/petsc/arch-test/lib/libpetsc.so.3.04.4)
==19041==    by 0x401D15: main (in /scratch/balay/petsc/src/ksp/ksp/examples/tutorials/ex56)
==19041== Your program just tried to execute an instruction that Valgrind
==19041== did not recognise.  There are two possible reasons for this.
==19041== 1. Your program has a bug and erroneously jumped to a non-code
==19041==    location.  If you are running Memcheck and you just saw a
==19041==    warning about a bad jump, it's probably your program's fault.
==19041== 2. The instruction is legitimate but Valgrind doesn't handle it,
==19041==    i.e. it's Valgrind's fault.  If you think this is the case or
==19041==    you are not sure, please let us know and we'll try to fix it.
==19041== Either way, Valgrind will now raise a SIGILL signal which will
==19041== probably kill your program.
==19041== 
==19041== Process terminating with default action of signal 4 (SIGILL)
==19041==  Illegal opcode at address 0x4EF760E
==19041==    at 0x4EF760E: PetscSetDisplay (in /scratch/balay/petsc/arch-test/lib/libpetsc.so.3.04.4)
==19041==    by 0x4F4BB1D: PetscOptionsCheckInitial_Private (in /scratch/balay/petsc/arch-test/lib/libpetsc.so.3.04.4)
==19041==    by 0x4F51996: PetscInitialize (in /scratch/balay/petsc/arch-test/lib/libpetsc.so.3.04.4)
==19041==    by 0x401D15: main (in /scratch/balay/petsc/src/ksp/ksp/examples/tutorials/ex56)
Illegal instruction (core dumped)
balay at es^/scratch/balay/petsc/src/ksp/ksp/examples/tutorials(master) $ /scratch/balay/valgrind-3.9.0/vg-in-place --version
valgrind-3.9.0
balay at es^/scratch/balay/petsc/src/ksp/ksp/examples/tutorials(master) $ /scratch/balay/valgrind-3.9.0/vg-in-place --tool=memcheck -q  ./ex56 -ne 9 -alpha 1.e-3 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -two_solves -ksp_monitor_short -use_mat_nearnullspace
  0 KSP Residual norm 740.547 
  1 KSP Residual norm 104.004 
  2 KSP Residual norm 79.1334 
  3 KSP Residual norm 50.0497 
  4 KSP Residual norm 4.40859 
  5 KSP Residual norm 1.56451 
  6 KSP Residual norm 0.601773 
  7 KSP Residual norm 0.225864 
  8 KSP Residual norm 0.0122203 
  9 KSP Residual norm 0.00290625 
  0 KSP Residual norm 0.00740547 
  1 KSP Residual norm 0.00104004 
  2 KSP Residual norm 0.000791334 
  3 KSP Residual norm 0.000500497 
  4 KSP Residual norm 4.40859e-05 
  5 KSP Residual norm 1.56451e-05 
  6 KSP Residual norm 6.01773e-06 
  7 KSP Residual norm 2.25864e-06 
  8 KSP Residual norm 1.22203e-07 
  9 KSP Residual norm 2.90625e-08 
  0 KSP Residual norm 7.40547e-08 
  1 KSP Residual norm 1.04004e-08 
  2 KSP Residual norm 7.91334e-09 
  3 KSP Residual norm 5.00497e-09 
  4 KSP Residual norm 4.409e-10 
  5 KSP Residual norm 1.565e-10 
  6 KSP Residual norm 6.018e-11 
  7 KSP Residual norm 2.259e-11 
  8 KSP Residual norm < 1.e-11
  9 KSP Residual norm < 1.e-11
[0]main |b-Ax|/|b|=6.068344e-05, |b|=5.391826e+00, emax=9.964453e-01
balay at es^/scratch/balay/petsc/src/ksp/ksp/examples/tutorials(master) $ 


On Wed, 21 May 2014, Tabrez Ali wrote:

> Hello
> 
> With petsc-dev I get the following error with my own code and also with ex56
> as shown below. Both run fine otherwise. This is with Valgrind 3.7 (in Debian
> stable).
> 
> Is this a PETSc or Valgrind issue?
> 
> T
> 
> stali at i5:~/petsc-dev/src/ksp/ksp/examples/tutorials$ valgrind ./ex56 -ne 9
> -alpha 1.e-3 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1
> -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -two_solves
> -ksp_monitor_short -use_mat_nearnullspace
> ==16123== Memcheck, a memory error detector
> ==16123== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
> ==16123== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==16123== Command: ./ex56 -ne 9 -alpha 1.e-3 -pc_type gamg -pc_gamg_type agg
> -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10
> -pc_gamg_reuse_interpolation true -two_solves -ksp_monitor_short
> -use_mat_nearnullspace
> ==16123==
> vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x38 0x39
> ==16123== valgrind: Unrecognised instruction at address 0x4228928.
> ==16123==    at 0x4228928: ISCreateGeneral_Private (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x4228D54: ISGeneralSetIndices_General (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x4229504: ISGeneralSetIndices (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x422976F: ISCreateGeneral (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x4A94CAA: PCGAMGCoarsen_AGG (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x4A84FD6: PCSetUp_GAMG (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x49E8163: PCSetUp (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x4AE6023: KSPSetUp (in
> /home/stali/petsc-dev/arch-linux2-c-debug/lib/libpetsc.so.3.04.4)
> ==16123==    by 0x804C7D6: main (in
> /home/stali/petsc-dev/src/ksp/ksp/examples/tutorials/ex56)
> ==16123== Your program just tried to execute an instruction that Valgrind
> ==16123== did not recognise.  There are two possible reasons for this.
> ==16123== 1. Your program has a bug and erroneously jumped to a non-code
> ==16123==    location.  If you are running Memcheck and you just saw a
> ==16123==    warning about a bad jump, it's probably your program's fault.
> ==16123== 2. The instruction is legitimate but Valgrind doesn't handle it,
> ==16123==    i.e. it's Valgrind's fault.  If you think this is the case or
> ==16123==    you are not sure, please let us know and we'll try to fix it.
> ==16123== Either way, Valgrind will now raise a SIGILL signal which will
> ==16123== probably kill your program.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 4 Illegal instruction: Likely due to
> memory corruption
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC ERROR: or
> try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] ISCreateGeneral_Private line 575
> /home/stali/petsc-dev/src/vec/is/is/impls/general/general.c
> [0]PETSC ERROR: [0] ISGeneralSetIndices_General line 674
> /home/stali/petsc-dev/src/vec/is/is/impls/general/general.c
> [0]PETSC ERROR: [0] ISGeneralSetIndices line 662
> /home/stali/petsc-dev/src/vec/is/is/impls/general/general.c
> [0]PETSC ERROR: [0] ISCreateGeneral line 631
> /home/stali/petsc-dev/src/vec/is/is/impls/general/general.c
> [0]PETSC ERROR: [0] PCGAMGCoarsen_AGG line 976
> /home/stali/petsc-dev/src/ksp/pc/impls/gamg/agg.c
> [0]PETSC ERROR: [0] PCSetUp_GAMG line 487
> /home/stali/petsc-dev/src/ksp/pc/impls/gamg/gamg.c
> [0]PETSC ERROR: [0] KSPSetUp line 219
> /home/stali/petsc-dev/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.4.4-4344-ge0d8a6f  GIT
> Date: 2014-05-21 16:02:44 -0500
> [0]PETSC ERROR: ./ex56 on a arch-linux2-c-debug named i5 by stali Wed May 21
> 16:41:07 2014
> [0]PETSC ERROR: Configure options --with-fc=gfortran --with-cc=gcc
> --download-mpich --with-metis=1 --download-metis=1 --COPTFLAGS="-O3
> -march=native" --FOPTFLAGS="-O3 -march=native" --with-shared-libraries
> --with-debugging=1
> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> [unset]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> ==16123==
> ==16123== HEAP SUMMARY:
> ==16123==     in use at exit: 4,627,684 bytes in 1,188 blocks
> ==16123==   total heap usage: 1,649 allocs, 461 frees, 6,073,192 bytes
> allocated
> ==16123==
> ==16123== LEAK SUMMARY:
> ==16123==    definitely lost: 0 bytes in 0 blocks
> ==16123==    indirectly lost: 0 bytes in 0 blocks
> ==16123==      possibly lost: 0 bytes in 0 blocks
> ==16123==    still reachable: 4,627,684 bytes in 1,188 blocks
> ==16123==         suppressed: 0 bytes in 0 blocks
> ==16123== Rerun with --leak-check=full to see details of leaked memory
> ==16123==
> ==16123== For counts of detected and suppressed errors, rerun with: -v
> ==16123== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 61 from 8)
> 
> 



More information about the petsc-users mailing list