[petsc-users] If valgrind says no memory prolbem.
Barry Smith
bsmith at mcs.anl.gov
Wed Apr 21 07:55:31 CDT 2010
You need to use a debugger to see where it is crashing. Run on
your local machine, laptop/workstation with the option -
start_in_debugger type cont into the debuggers when they come up and
type where when it crashes.
Barry
On Apr 21, 2010, at 5:47 AM, (Rebecca) Xuefei YUAN wrote:
> Dear Aron,
>
> Thanks for your reply.
>
> It is fine to run it in my machine with the same parameters and np.
>
> Here are the output files for the two:
>
> 1) running in my local machine:
> rebecca at YuanWork:~/linux/code/twoway/twoway_new/valgrind$ mpiexec -
> np 4 ./twqt2ff.exe -options_file option_all_twqt2ff
> **************************************************
> number of processors = 4
> viscosity = 1.0000000000000000e-03
> resistivity = 1.0000000000000000e-03
> skin depth = 1.0000000000000000e+00
> hyper resistivity = 1.6384000000000001e-05
> hyper viscosity = 6.5536000000000011e-02
> problem size: 101 by 101
> dx = 1.2673267326732673e-01
> dy = 6.4000000000000001e-02
> dt = 5.0000000000000003e-02
> adaptive time step size (1:yes;0:no) = 0
> **************************************************
> 0 SNES Function norm 1.558736678272e-02
> Linear solve converged due to CONVERGED_RTOL iterations 2
> 1 SNES Function norm 3.340317612139e-03
> Linear solve converged due to CONVERGED_RTOL iterations 3
> 2 SNES Function norm 3.147655751158e-04
> Linear solve converged due to CONVERGED_RTOL iterations 5
> 3 SNES Function norm 5.447758329758e-06
> Linear solve converged due to CONVERGED_RTOL iterations 9
> 4 SNES Function norm 6.186506196319e-09
> Linear solve converged due to CONVERGED_RTOL iterations 16
> 5 SNES Function norm 7.316295670455e-13
> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
> **************************************************
> time step = 1
> current time step size= 5.0000000000000003e-02
> time = 5.0000000000000003e-02
> number of nonlinear iterations = 5
> number of linear iterations = 35
> function norm = 7.3162956704552350e-13
> **************************************************
> total number of time steps = 1
> total number of nonlinear iterations = 5
> total number of linear iterations = 35
>
> 2) here is what I get from amdahl:
> **************************************************
> number of processors = 4
> viscosity = 1.0000000000000000e-02
> resistivity = 5.0000000000000001e-03
> skin depth = 0.0000000000000000e+00
> hyper resistivity = 8.1920000000000002e-05
> hyper viscosity = 6.5535999999999997e-02
> problem size: 101 by 101
> dx = 1.2673267326732673e-01
> dy = 6.4000000000000001e-02
> dt = 5.0000000000000003e-02
> adaptive time step size (1:yes;0:no) = 0
> **************************************************
> 0 SNES Function norm 1.121373952980e-02
>
> STOPPED AND THE ERROR MESSAGE CAME OUT AS:
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
> Violation, probably memory access out of range
> srun: error: task 0: Exited with exit code 59
> [0]PETSC ERROR: Try option -start_in_debugger or -
> on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal
> [0]PETSC ERROR: or try http://valgrind.org on linux or man
> libgmalloc on Apple to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: INSTEAD the line number of the start of the
> function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Signal received!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul 6
> 11:33:34 CDT 2009
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: /tmp/lustre/home/xy2102/April2110/die0/./twqt2ff.exe
> on a linux-c-g named sci-m0n0.scsystem by xy2102 Wed Apr 21 05:30:10
> 2010
> [0]PETSC ERROR: Libraries linked from /home/xy2102/soft/petsc-3.0.0-
> p7/linux-c-gnu-debug/lib
> [0]PETSC ERROR: Configure run at Mon Jul 20 13:56:37 2009
> [0]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif77 --
> with-mpiexec=srun --with-debugging=1 --with-fortran-kernels=generic
> --with-shared=0 --CFLAGS=-G0 --FFLAGS=-G0
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> In: PMI_Abort(59, application called MPI_Abort(MPI_COMM_WORLD, 59) -
> process 0)
> srun: error: task 2-3: Killed
> srun: error: task 1: Killed
>
>
> The makefile is
>
> 1) locally:
>
> ##### Rebecca local dev ###################
> PETSC_ARCH=linux-gnu-c-debug
> PETSC_DIR=/home/rebecca/soft/petsc-dev
> include ${PETSC_DIR}/conf/variables
> include ${PETSC_DIR}/conf/rules
> #######################################
>
> twqt2ff: twqt2ff.o chkopts
> -${CLINKER} -g -O0 -o twqt2ff.exe twqt2ff.o ${PETSC_SNES_LIB}
>
> 2) amdahl:
> ##### Amdahl 3.0 ##################
> PETSC_ARCH=linux-c-gnu-debug
> PETSC_DIR=/home/xy2102/soft/petsc-3.0.0-p7
> include ${PETSC_DIR}/conf/base
> #######################################
>
> twqt2ff: twqt2ff.o chkopts
> -${CLINKER} -o twqt2ff.exe twqt2ff.o ${PETSC_SNES_LIB}
>
>
> Could it be the different PETSc version and make options?
>
> Thanks very much!
>
> Rebecca
>
>
>
>
> Quoting Aron Ahmadia <aron.ahmadia at kaust.edu.sa>:
>
>> A SEGV is definitely a memory access problem, as PETSc suggests, it
>> is
>> likely to be a memory access out of range.
>>
>> I don't recommend trying to debug this problem on amdahl, can you
>> reproduce
>> the problem just running with multiple processes on your workstation?
>>
>> Warm Regards,
>> Aron
>>
>> On Wed, Apr 21, 2010 at 12:34 PM, (Rebecca) Xuefei YUAN <xy2102 at columbia.edu
>>> wrote:
>>
>>> Dear all,
>>>
>>> I checked the code with valgrind, and there is no memory problem,
>>> but when
>>> running parallelly, there is a message like
>>>
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>> Violation,
>>> probably memory access out of range
>>> srun: error: task 0: Exited with exit code 59
>>> [0]PETSC ERROR: Try option -start_in_debugger or -
>>> on_error_attach_debugger
>>> [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal
>>> [0]PETSCERROR: or try
>>> http://valgrind.org on linux or man libgmalloc on Apple to find
>>> memory
>>> corruption errors
>>> [0]PETSC ERROR: likely location of problem given in stack below
>>> [0]PETSC ERROR: --------------------- Stack Frames
>>> ------------------------------------
>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>> available,
>>> [0]PETSC ERROR: INSTEAD the line number of the start of the
>>> function
>>> [0]PETSC ERROR: is given.
>>> [0]PETSC ERROR: --------------------- Error Message
>>> ------------------------------------
>>> [0]PETSC ERROR: Signal received!
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul 6
>>> 11:33:34
>>> CDT 2009
>>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>>> [0]PETSC ERROR: See docs/index.html for manual pages.
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: /tmp/lustre/home/xy2102/April2110/die0/./
>>> twqt2ff.exe on a
>>> linux-c-g named sci-m0n0.scsystem by xy2102 Wed Apr 21 05:30:10 2010
>>> [0]PETSC ERROR: Libraries linked from
>>> /home/xy2102/soft/petsc-3.0.0-p7/linux-c-gnu-debug/lib
>>> [0]PETSC ERROR: Configure run at Mon Jul 20 13:56:37 2009
>>> [0]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif77
>>> --with-mpiexec=srun --with-debugging=1 --with-fortran-
>>> kernels=generic
>>> --with-shared=0 --CFLAGS=-G0 --FFLAGS=-G0
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------
>>> [0]PETSC ERROR: User provided function() line 0 in unknown directory
>>> unknown file
>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>> In: PMI_Abort(59, application called MPI_Abort(MPI_COMM_WORLD, 59) -
>>> process 0)
>>> srun: error: task 2-3: Killed
>>> srun: error: task 1: Killed
>>>
>>> What is wrong?
>>>
>>> Cheers,
>>>
>>> Rebecca
>>>
>>> --
>>> (Rebecca) Xuefei YUAN
>>> Department of Applied Physics and Applied Mathematics
>>> Columbia University
>>> Tel:917-399-8032
>>> www.columbia.edu/~xy2102
>>>
>>>
>>
>
>
>
> --
> (Rebecca) Xuefei YUAN
> Department of Applied Physics and Applied Mathematics
> Columbia University
> Tel:917-399-8032
> www.columbia.edu/~xy2102
>
More information about the petsc-users
mailing list