[petsc-users] Floating point exception

Barry Smith bsmith at mcs.anl.gov
Fri Apr 24 13:12:23 CDT 2015


> On Apr 24, 2015, at 1:05 PM, Danyang Su <danyang.su at gmail.com> wrote:
> 
> Hi All,
> 
> One of my case crashes because of floating point exception when using 4 processors, as shown below. But if I run this case with 1 processor, it works fine. I have tested the codes with around 100 cases up to 768 processors, all other cases work fine. I just wonder if this kind of error is caused because of NaN in jacobi matrix, RHS or preconditioner?

   Yes, almost for sure it is one of these places.

   First run the bad case with -fp_trap  if all goes well you'll see the function where the FPE is generated. Then run also with -start_in_debugger and
type cont in all four debugger windows. When the FPE happens the debugger should stop showing exactly where the FPE happens.

  Barry

> I can check all the entries of jacobi matrix to see if the value is valid, but this seems not a good idea as it takes a long time to reach this point. If I restart the simulation from a specified time (e.g., 7.685 in this case), then the error does not occur.
> 
> Would you please give me any suggestion on debugging this case?
> 
> Thanks and Regards,
> 
> Danyang
> 
> 
> timestep:    2730 time: 7.665E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> timestep:    2731 time: 7.675E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> timestep:    2732 time: 7.685E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> timestep:    2733 time: 7.695E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> timestep:    2734 time: 7.705E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> Reduce time step for reactive transport
> timestep:    2734 time: 7.700E+00 years   delt: 5.000E-03 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> Reduce time step for reactive transport
> timestep:    2734 time: 7.697E+00 years   delt: 2.500E-03 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Floating point exception
> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [2]PETSC ERROR: Floating point exception
> [2]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3
> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23 15:38:52 2015
> [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack
> [1]PETSC ERROR: #1 VecValidValues() line 34 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
> ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23 15:38:52 2015
> [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack
> [2]PETSC ERROR: #1 VecValidValues() line 34 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
> [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
> [1]PETSC ERROR: #2 PCApply() line 442 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
> [2]PETSC ERROR: #3 KSP_PCApply() line 230 in /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
> #3 KSP_PCApply() line 230 in /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
> [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
> [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
> [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
> [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
> [2]PETSC ERROR: #6 KSPSolve() line 459 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: #6 KSPSolve() line 459 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
> ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
> [mpiexec at nwmop] Press Ctrl-C again to force abort



More information about the petsc-users mailing list