[petsc-users] Solves with valgrind, not without
Arturo Fountain
art.fountain at gmail.com
Tue Oct 12 14:28:11 CDT 2010
Indeed using a debugging version of PETSc and a debugger (gdb) got me to my
error.
Thank you very much.
Art
On Tue, Oct 12, 2010 at 12:36 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> You'll need to build a debug version of the libraries and run in the
> debugger; in the debugger you will need to have it catch floating point
> exceptions to stop when the problem appears (that is debugger dependent).
> Based on the difference in behavior I am guessing the problem is memory
> corruption.
> On my mac I run valgrind with the option -q --tool=memcheck --dsymutil=yes
> --num-callers=20
>
> Barry
>
> On Oct 12, 2010, at 1:12 PM, Arturo Fountain wrote:
>
> > I am using petsc-3.1-p5 to solve a system of equations at multiple
> timesteps. The LHS matrix is the same at each timestep although the RHS
> changes. Strange thing is, the system solves when using valgrind but not
> without.
> >
> > This is the case with every solver I have tried and on multiple machines.
> I am most interested in cg/cgne although I have used gmres in the past when
> the the condition of the matrix was questionable (it is no longer
> questionable). In either case the sytem will converge for one and only one
> time step when calling:
> >
> > mpiexec.uni -n 1 ./MONO -indir InputAniso -outdir OutputAniso -ksp_type
> gmres -pc_type jacobi -info -ksp_rtol 1e-8 -ksp_initial_guess_nonzero true
> >
> > however, calling the same program with valgrind:
> >
> > G_SLICE=always-malloc G_DEBUG=gc-friendly mpiexec.uni -n 1 valgrind -v
> --leak-check=full --show-reachable=yes --track-origins=yes ./MONO -indir
> InputAniso -outdir OutputAniso -ksp_type gmres -pc_type jacobi -info
> -ksp_rtol 1e-8 -ksp_initial_guess_nonzero true
> >
> > it will solve many timesteps. (at least 5). The error I receive (again,
> only without valgrind) is:
> >
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
> > [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSCERROR: or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> > [0]PETSC ERROR: likely location of problem given in stack below
> > [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
> -2080374783
> > [0] PetscCommDuplicate(): returning tag 2147483571
> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > [0]PETSC ERROR: INSTEAD the line number of the start of the
> function
> > [0]PETSC ERROR: is given.
> > [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> > [0]PETSC ERROR: Signal received!
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54
> CDT 2010
> > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [0]PETSC ERROR: See docs/index.html for manual pages.
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: ./MONO on a linux-gnu named duality by xerxez Tue Oct 12
> 10:26:18 2010
> > [0]PETSC ERROR: Libraries linked from
> /home/xerxez/lib/petsc-3.1-p5/linux-gnu-c-debug/lib
> > [0]PETSC ERROR: Configure run at Sat Oct 9 19:15:29 2010
> > [0]PETSC ERROR: Configure options --download-f-blas-lapack=1
> --download-mpich=1 --download-blacs=1 --download-hypre=1
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0[unset]:
> aborting job:
> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
> >
> > I have read up on
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signaland don't see anything wrong or anything different in the calls. After
> receiving this error I tried using valgrind but the error simply goes away
> each and every time I call it with valgrind.
> >
> > Has anyone seen such a problem before?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101012/7e7a71cc/attachment-0001.htm>
More information about the petsc-users
mailing list