<p>Try running it in a debugger with -fp_trap.</p>
<p>Hmm, I don't remember if the -fp_trap patch made it into 3.1, you might have to enable it manually. See "man feenableexcept" on systems with glibc, or _MM_SET_EXCEPTION_MASK on other x86/x64.</p>
<p>Jed</p>
<p><blockquote type="cite">On Oct 12, 2010 8:12 PM, "Arturo Fountain" <<a href="mailto:art.fountain@gmail.com" target="_blank">art.fountain@gmail.com</a>> wrote:<br><br>I am using petsc-3.1-p5 to solve a system of equations at multiple timesteps. The LHS matrix is the same at each timestep although the RHS changes. Strange thing is, the system solves when using valgrind but not without.<br>
<br>
This is the case with every solver I have tried and on multiple machines. I am most interested in cg/cgne although I have used gmres in the past when the the condition of the matrix was questionable (it is no longer questionable). In either case the sytem will converge for one and only one time step when calling:<br>
<br>mpiexec.uni -n 1 ./MONO -indir InputAniso -outdir OutputAniso -ksp_type gmres -pc_type jacobi -info -ksp_rtol 1e-8 -ksp_initial_guess_nonzero true<br><br>however, calling the same program with valgrind:<br><br>G_SLICE=always-malloc G_DEBUG=gc-friendly mpiexec.uni -n 1 valgrind -v --leak-check=full --show-reachable=yes --track-origins=yes ./MONO -indir InputAniso -outdir OutputAniso -ksp_type gmres -pc_type jacobi -info -ksp_rtol 1e-8 -ksp_initial_guess_nonzero true<br>
<br>it will solve many timesteps. (at least 5). The error I receive (again, only without valgrind) is:<br><br>[0]PETSC ERROR: ------------------------------------------------------------------------<br>[0]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero<br>
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>[0]PETSC ERROR: or see <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal[0]PETSC</a> ERROR: or try <a href="http://valgrind.org" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to find memory corruption errors<br>
[0]PETSC ERROR: likely location of problem given in stack below<br>[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------<br>[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850689 -2080374783<br>
[0] PetscCommDuplicate(): returning tag 2147483571<br>[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br>[0]PETSC ERROR: INSTEAD the line number of the start of the function<br>[0]PETSC ERROR: is given.<br>
[0]PETSC ERROR: --------------------- Error Message ------------------------------------<br>[0]PETSC ERROR: Signal received!<br>[0]PETSC ERROR: ------------------------------------------------------------------------<br>
[0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010<br>
[0]PETSC ERROR: See docs/changes/index.html for recent updates.<br>[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>[0]PETSC ERROR: See docs/index.html for manual pages.<br>[0]PETSC ERROR: ------------------------------------------------------------------------<br>
[0]PETSC ERROR: ./MONO on a linux-gnu named duality by xerxez Tue Oct 12 10:26:18 2010<br>[0]PETSC ERROR: Libraries linked from /home/xerxez/lib/petsc-3.1-p5/linux-gnu-c-debug/lib<br>[0]PETSC ERROR: Configure run at Sat Oct 9 19:15:29 2010<br>
[0]PETSC ERROR: Configure options --download-f-blas-lapack=1 --download-mpich=1 --download-blacs=1 --download-hypre=1<br>[0]PETSC ERROR: ------------------------------------------------------------------------<br>[0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file<br>
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0[unset]: aborting job:<br>application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0<br><br>I have read up on <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html#Signal</a> and don't see anything wrong or anything different in the calls. After receiving this error I tried using valgrind but the error simply goes away each and every time I call it with valgrind.<br>
<br>Has anyone seen such a problem before?<br>
</blockquote></p>