[petsc-users] If valgrind says no memory prolbem.

(Rebecca) Xuefei YUAN xy2102 at columbia.edu
Wed Apr 21 05:47:25 CDT 2010


Dear Aron,

Thanks for your reply.

It is fine to run it in my machine with the same parameters and np.

Here are the output files for the two:

1) running in my local machine:
rebecca at YuanWork:~/linux/code/twoway/twoway_new/valgrind$ mpiexec -np  
4 ./twqt2ff.exe -options_file option_all_twqt2ff
**************************************************
number of processors = 4
viscosity = 1.0000000000000000e-03
resistivity = 1.0000000000000000e-03
skin depth = 1.0000000000000000e+00
hyper resistivity = 1.6384000000000001e-05
hyper viscosity = 6.5536000000000011e-02
problem size: 101 by 101
dx = 1.2673267326732673e-01
dy = 6.4000000000000001e-02
dt = 5.0000000000000003e-02
adaptive time step size (1:yes;0:no) = 0
**************************************************
   0 SNES Function norm 1.558736678272e-02
Linear solve converged due to CONVERGED_RTOL iterations 2
   1 SNES Function norm 3.340317612139e-03
Linear solve converged due to CONVERGED_RTOL iterations 3
   2 SNES Function norm 3.147655751158e-04
Linear solve converged due to CONVERGED_RTOL iterations 5
   3 SNES Function norm 5.447758329758e-06
Linear solve converged due to CONVERGED_RTOL iterations 9
   4 SNES Function norm 6.186506196319e-09
Linear solve converged due to CONVERGED_RTOL iterations 16
   5 SNES Function norm 7.316295670455e-13
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
**************************************************
time step = 1
current time step size= 5.0000000000000003e-02
time = 5.0000000000000003e-02
number of nonlinear iterations = 5
number of linear iterations = 35
function norm = 7.3162956704552350e-13
**************************************************
total number of time steps = 1
total number of nonlinear iterations = 5
total number of linear iterations = 35

2) here is what I get from amdahl:
**************************************************
number of processors = 4
viscosity = 1.0000000000000000e-02
resistivity = 5.0000000000000001e-03
skin depth = 0.0000000000000000e+00
hyper resistivity = 8.1920000000000002e-05
hyper viscosity = 6.5535999999999997e-02
problem size: 101 by 101
dx = 1.2673267326732673e-01
dy = 6.4000000000000001e-02
dt = 5.0000000000000003e-02
adaptive time step size (1:yes;0:no) = 0
**************************************************
   0 SNES Function norm 1.121373952980e-02

STOPPED AND THE ERROR MESSAGE CAME OUT AS:
[0]PETSC ERROR:  
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,  
probably memory access out of range
srun: error: task 0: Exited with exit code 59
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see  
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC ERROR: or try http://valgrind.org on linux or man libgmalloc on Apple to find memory corruption  
errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames  
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: --------------------- Error Message  
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:  
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6  
11:33:34 CDT 2009
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:  
------------------------------------------------------------------------
[0]PETSC ERROR: /tmp/lustre/home/xy2102/April2110/die0/./twqt2ff.exe  
on a linux-c-g named sci-m0n0.scsystem by xy2102 Wed Apr 21 05:30:10  
2010
[0]PETSC ERROR: Libraries linked from  
/home/xy2102/soft/petsc-3.0.0-p7/linux-c-gnu-debug/lib
[0]PETSC ERROR: Configure run at Mon Jul 20 13:56:37 2009
[0]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif77  
--with-mpiexec=srun --with-debugging=1 --with-fortran-kernels=generic  
--with-shared=0 --CFLAGS=-G0 --FFLAGS=-G0
[0]PETSC ERROR:  
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory  
unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
In: PMI_Abort(59, application called MPI_Abort(MPI_COMM_WORLD, 59) -  
process 0)
srun: error: task 2-3: Killed
srun: error: task 1: Killed


The makefile is

1) locally:

##### Rebecca local dev ###################
PETSC_ARCH=linux-gnu-c-debug
PETSC_DIR=/home/rebecca/soft/petsc-dev
include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
#######################################

twqt2ff:    twqt2ff.o chkopts
	-${CLINKER} -g -O0 -o twqt2ff.exe twqt2ff.o  ${PETSC_SNES_LIB}

2) amdahl:
##### Amdahl 3.0 ##################
PETSC_ARCH=linux-c-gnu-debug
PETSC_DIR=/home/xy2102/soft/petsc-3.0.0-p7
include ${PETSC_DIR}/conf/base
#######################################

twqt2ff:    twqt2ff.o chkopts
         -${CLINKER} -o twqt2ff.exe twqt2ff.o  ${PETSC_SNES_LIB}


Could it be the different PETSc version and make options?

Thanks very much!

Rebecca




Quoting Aron Ahmadia <aron.ahmadia at kaust.edu.sa>:

> A SEGV is definitely a memory access problem, as PETSc suggests, it is
> likely to be a memory access out of range.
>
> I don't recommend trying to debug this problem on amdahl, can you reproduce
> the problem just running with multiple processes on your workstation?
>
> Warm Regards,
> Aron
>
> On Wed, Apr 21, 2010 at 12:34 PM, (Rebecca) Xuefei YUAN <xy2102 at columbia.edu
>> wrote:
>
>> Dear all,
>>
>> I checked the code with valgrind, and there is no memory problem, but when
>> running parallelly, there is a message like
>>
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> srun: error: task 0: Exited with exit code 59
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSCERROR: or   
>> try
>> http://valgrind.org on linux or man libgmalloc on Apple to find memory
>> corruption errors
>> [0]PETSC ERROR: likely location of problem given in stack below
>> [0]PETSC ERROR: ---------------------  Stack Frames
>> ------------------------------------
>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>> available,
>> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [0]PETSC ERROR:       is given.
>> [0]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> [0]PETSC ERROR: Signal received!
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 7, Mon Jul  6 11:33:34
>> CDT 2009
>> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [0]PETSC ERROR: See docs/index.html for manual pages.
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: /tmp/lustre/home/xy2102/April2110/die0/./twqt2ff.exe on a
>> linux-c-g named sci-m0n0.scsystem by xy2102 Wed Apr 21 05:30:10 2010
>> [0]PETSC ERROR: Libraries linked from
>> /home/xy2102/soft/petsc-3.0.0-p7/linux-c-gnu-debug/lib
>> [0]PETSC ERROR: Configure run at Mon Jul 20 13:56:37 2009
>> [0]PETSC ERROR: Configure options --with-cc=mpicc --with-fc=mpif77
>> --with-mpiexec=srun --with-debugging=1 --with-fortran-kernels=generic
>> --with-shared=0 --CFLAGS=-G0 --FFLAGS=-G0
>> [0]PETSC ERROR:
>> ------------------------------------------------------------------------
>> [0]PETSC ERROR: User provided function() line 0 in unknown directory
>> unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>> In: PMI_Abort(59, application called MPI_Abort(MPI_COMM_WORLD, 59) -
>> process 0)
>> srun: error: task 2-3: Killed
>> srun: error: task 1: Killed
>>
>> What is wrong?
>>
>> Cheers,
>>
>> Rebecca
>>
>> --
>> (Rebecca) Xuefei YUAN
>> Department of Applied Physics and Applied Mathematics
>> Columbia University
>> Tel:917-399-8032
>> www.columbia.edu/~xy2102
>>
>>
>



-- 
(Rebecca) Xuefei YUAN
Department of Applied Physics and Applied Mathematics
Columbia University
Tel:917-399-8032
www.columbia.edu/~xy2102



More information about the petsc-users mailing list