[petsc-users] Solves with valgrind, not without

Tue Oct 12 13:12:54 CDT 2010

I am using petsc-3.1-p5 to solve a system of equations at multiple
timesteps. The LHS matrix is the same at each timestep although the RHS
changes. Strange thing is, the system solves when using valgrind but not
without.

This is the case with every solver I have tried and on multiple machines. I
am most interested in cg/cgne although I have used gmres in the past when
the the condition of the matrix was questionable (it is no longer
questionable). In either case the sytem will converge for one and only one
time step when calling:

mpiexec.uni -n 1 ./MONO -indir InputAniso -outdir OutputAniso -ksp_type
gmres -pc_type jacobi -info -ksp_rtol 1e-8 -ksp_initial_guess_nonzero true

however, calling the same program with valgrind:

G_SLICE=always-malloc G_DEBUG=gc-friendly mpiexec.uni -n 1 valgrind -v
--leak-check=full --show-reachable=yes --track-origins=yes ./MONO -indir
InputAniso -outdir OutputAniso -ksp_type gmres -pc_type jacobi -info
-ksp_rtol 1e-8 -ksp_initial_guess_nonzero true

it will solve many timesteps. (at least 5). The error I receive (again, only
without valgrind) is:

[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
Exception,probably divide by zero
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSCERROR:
or try
http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689
-2080374783
[0] PetscCommDuplicate():   returning tag 2147483571
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54
CDT 2010
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: ./MONO on a linux-gnu named duality by xerxez Tue Oct 12
10:26:18 2010
[0]PETSC ERROR: Libraries linked from
/home/xerxez/lib/petsc-3.1-p5/linux-gnu-c-debug/lib
[0]PETSC ERROR: Configure run at Sat Oct  9 19:15:29 2010
[0]PETSC ERROR: Configure options --download-f-blas-lapack=1
--download-mpich=1 --download-blacs=1 --download-hypre=1
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: User provided function() line 0 in unknown directory unknown
file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0[unset]:
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

I have read up on
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signaland
don't see anything wrong or anything different in the calls. After
receiving this error I tried using valgrind but the error simply goes away
each and every time I call it with valgrind.

Has anyone seen such a problem before?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101012/577ab005/attachment.htm>