[petsc-users] If valgrind says no memory prolbem.

Barry Smith bsmith at mcs.anl.gov
Wed Apr 21 07:55:31 CDT 2010

    You need to use a debugger to see where it is crashing. Run on  
your local machine, laptop/workstation with the option - 
start_in_debugger type cont into the debuggers when they come up and  
type where when it crashes.


On Apr 21, 2010, at 5:47 AM, (Rebecca) Xuefei YUAN wrote:

> Dear Aron,
> Thanks for your reply.
> It is fine to run it in my machine with the same parameters and np.
> Here are the output files for the two:
> 1) running in my local machine:
> rebecca at YuanWork:~/linux/code/twoway/twoway_new/valgrind$ mpiexec - 
> np 4 ./twqt2ff.exe -options_file option_all_twqt2ff
> **************************************************
> number of processors = 4
> viscosity = 1.0000000000000000e-03
> resistivity = 1.0000000000000000e-03
> skin depth = 1.0000000000000000e+00
> hyper resistivity = 1.6384000000000001e-05
> hyper viscosity = 6.5536000000000011e-02
> problem size: 101 by 101
> dx = 1.2673267326732673e-01
> dy = 6.4000000000000001e-02
> dt = 5.0000000000000003e-02
> adaptive time step size (1:yes;0:no) = 0
> **************************************************
>  0 SNES Function norm 1.558736678272e-02
> Linear solve converged due to CONVERGED_RTOL iterations 2
>  1 SNES Function norm 3.340317612139e-03
> Linear solve converged due to CONVERGED_RTOL iterations 3
>  2 SNES Function norm 3.147655751158e-04
> Linear solve converged due to CONVERGED_RTOL iterations 5
>  3 SNES Function norm 5.447758329758e-06
> Linear solve converged due to CONVERGED_RTOL iterations 9
>  4 SNES Function norm 6.186506196319e-09
> Linear solve converged due to CONVERGED_RTOL iterations 16
>  5 SNES Function norm 7.316295670455e-13
> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
> **************************************************
> time step = 1
> current time step size= 5.0000000000000003e-02
> time = 5.0000000000000003e-02
> number of nonlinear iterations = 5
> number of linear iterations = 35
> function norm = 7.3162956704552350e-13
> **************************************************
> total number of time steps = 1
> total number of nonlinear iterations = 5
> total number of linear iterations = 35
> 2) here is what I get from amdahl:
> **************************************************
> number of processors = 4
> viscosity = 1.0000000000000000e-02
> resistivity = 5.0000000000000001e-03
> skin depth = 0.0000000000000000e+00
> hyper resistivity = 8.1920000000000002e-05
> hyper viscosity = 6.5535999999999997e-02
> problem size: 101 by 101
> dx = 1.2673267326732673e-01
> dy = 6.4000000000000001e-02
> dt = 5.0000000000000003e-02
> adaptive time step size (1:yes;0:no) = 0
> **************************************************
>  0 SNES Function norm 1.121373952980e-02
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation  
> Violation, probably memory access out of range
> srun: error: task 0: Exited with exit code 59
> [0]PETSC ERROR: Try option -start_in_debugger or - 
> on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal 
> [0]PETSC ERROR: or try http://valgrind.org on linux or man  
> libgmalloc on Apple to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames  
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not  
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the  
> function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: --------------------- Error Message  
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6  
> 11:33:34 CDT 2009
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> ------------------------------------------------------------------------
> [0]PETSC ERROR: /tmp/lustre/home/xy2102/April2110/die0/./twqt2ff.exe  
> on a linux-c-g named sci-m0n0.scsystem by xy2102 Wed Apr 21 05:30:10  
> 2010
> [0]PETSC ERROR: Libraries linked from /home/xy2102/soft/petsc-3.0.0- 
> p7/linux-c-gnu-debug/lib
> [0]PETSC ERROR: Configure run at Mon Jul 20 13:56:37 2009
> [0]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif77 -- 
> with-mpiexec=srun --with-debugging=1 --with-fortran-kernels=generic  
> --with-shared=0 --CFLAGS=-G0 --FFLAGS=-G0
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory  
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> In: PMI_Abort(59, application called MPI_Abort(MPI_COMM_WORLD, 59) -  
> process 0)
> srun: error: task 2-3: Killed
> srun: error: task 1: Killed
> The makefile is
> 1) locally:
> ##### Rebecca local dev ###################
> PETSC_ARCH=linux-gnu-c-debug
> PETSC_DIR=/home/rebecca/soft/petsc-dev
> include ${PETSC_DIR}/conf/variables
> include ${PETSC_DIR}/conf/rules
> #######################################
> twqt2ff:    twqt2ff.o chkopts
> 	-${CLINKER} -g -O0 -o twqt2ff.exe twqt2ff.o  ${PETSC_SNES_LIB}
> 2) amdahl:
> ##### Amdahl 3.0 ##################
> PETSC_ARCH=linux-c-gnu-debug
> PETSC_DIR=/home/xy2102/soft/petsc-3.0.0-p7
> include ${PETSC_DIR}/conf/base
> #######################################
> twqt2ff:    twqt2ff.o chkopts
>        -${CLINKER} -o twqt2ff.exe twqt2ff.o  ${PETSC_SNES_LIB}
> Could it be the different PETSc version and make options?
> Thanks very much!
> Rebecca
> Quoting Aron Ahmadia <aron.ahmadia at kaust.edu.sa>:
>> A SEGV is definitely a memory access problem, as PETSc suggests, it  
>> is
>> likely to be a memory access out of range.
>> I don't recommend trying to debug this problem on amdahl, can you  
>> reproduce
>> the problem just running with multiple processes on your workstation?
>> Warm Regards,
>> Aron
>> On Wed, Apr 21, 2010 at 12:34 PM, (Rebecca) Xuefei YUAN <xy2102 at columbia.edu
>>> wrote:
>>> Dear all,
>>> I checked the code with valgrind, and there is no memory problem,  
>>> but when
>>> running parallelly, there is a message like
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation  
>>> Violation,
>>> probably memory access out of range
>>> srun: error: task 0: Exited with exit code 59
>>> [0]PETSC ERROR: Try option -start_in_debugger or - 
>>> on_error_attach_debugger
>>> [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal 
>>> [0]PETSCERROR: or  try
>>> http://valgrind.org on linux or man libgmalloc on Apple to find  
>>> memory
>>> corruption errors
>>> [0]PETSC ERROR: likely location of problem given in stack below
>>> [0]PETSC ERROR: ---------------------  Stack Frames
>>> ------------------------------------
>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>> available,
>>> [0]PETSC ERROR:       INSTEAD the line number of the start of the  
>>> function
>>> [0]PETSC ERROR:       is given.
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [0]PETSC ERROR: Signal received!
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6  
>>> 11:33:34
>>> CDT 2009
>>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [0]PETSC ERROR: See docs/index.html for manual pages.
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: /tmp/lustre/home/xy2102/April2110/die0/./ 
>>> twqt2ff.exe on a
>>> linux-c-g named sci-m0n0.scsystem by xy2102 Wed Apr 21 05:30:10 2010
>>> [0]PETSC ERROR: Libraries linked from
>>> /home/xy2102/soft/petsc-3.0.0-p7/linux-c-gnu-debug/lib
>>> [0]PETSC ERROR: Configure run at Mon Jul 20 13:56:37 2009
>>> [0]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif77
>>> --with-mpiexec=srun --with-debugging=1 --with-fortran- 
>>> kernels=generic
>>> --with-shared=0 --CFLAGS=-G0 --FFLAGS=-G0
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: User provided function() line 0 in unknown directory
>>> unknown file
>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>> In: PMI_Abort(59, application called MPI_Abort(MPI_COMM_WORLD, 59) -
>>> process 0)
>>> srun: error: task 2-3: Killed
>>> srun: error: task 1: Killed
>>> What is wrong?
>>> Cheers,
>>> Rebecca
