[petsc-users] Floating point exception

Danyang Su danyang.su at gmail.com
Sat Apr 25 13:13:05 CDT 2015


Hi Barry,

With -fp_trap and -start_in_debugger options, the code crashed with the 
following error.

The code at #21  0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react 
at solver_ddmethod.F90:2850 is "call 
KSPSolve(ksp_react,b_react,x_react,ierr)"

I run this case with 4 processors and the preconditioner type is HYPRE. 
Does this mean something wrong in Matrix ksp_react or RHS b_react?

Thanks,

Danyang


  timestep:    1846 time: 3.392E+00 years   delt: 1.000E-02 years iter:  
1 max.sia: 0.000E+00 tol.sia: 0.000E+00
  Reduce time step for reactive transport
  timestep:    1846 time: 3.387E+00 years   delt: 5.000E-03 years iter:  
1 max.sia: 0.000E+00 tol.sia: 0.000E+00
  Reduce time step for reactive transport
  timestep:    1846 time: 3.385E+00 years   delt: 2.500E-03 years iter:  
1 max.sia: 0.000E+00 tol.sia: 0.000E+00
[0]PETSC ERROR: *** unknown floating point error occurred ***
[0]PETSC ERROR: The specific exception can be determined by running in a 
debugger.  When the
[0]PETSC ERROR: debugger traps the signal, the exception can be found 
with fetestexcept(0x3d)
[0]PETSC ERROR: where the result is a bitwise OR of the following flags:
[0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[0]PETSC ERROR: Try option -start_in_debugger
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[1]PETSC ERROR: *** unknown floating point error occurred ***
[1]PETSC ERROR: The specific exception can be determined by running in a 
debugger.  When the
[1]PETSC ERROR: debugger traps the signal, the exception can be found 
with fetestexcept(0x3d)
[1]PETSC ERROR: where the result is a bitwise OR of the following flags:
[1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[1]PETSC ERROR: Try option -start_in_debugger
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------
[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[1]PETSC ERROR:       INSTEAD the line number of the start of the function
[1]PETSC ERROR:       is given.
[1]PETSC ERROR: [1] PetscDefaultFPTrap line 379 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
[1]PETSC ERROR: [1] Hypre solve line 174 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[1]PETSC ERROR: [1] PCApply_HYPRE line 161 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[1]PETSC ERROR: [2]PETSC ERROR: *** unknown floating point error 
occurred ***
[2]PETSC ERROR: The specific exception can be determined by running in a 
debugger.  When the
[2]PETSC ERROR: debugger traps the signal, the exception can be found 
with fetestexcept(0x3d)
[2]PETSC ERROR: where the result is a bitwise OR of the following flags:
[2]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[2]PETSC ERROR: Try option -start_in_debugger
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------
[2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[2]PETSC ERROR:       INSTEAD the line number of the start of the function
[2]PETSC ERROR:       is given.
[2]PETSC ERROR: [2] PetscDefaultFPTrap line 379 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
[2]PETSC ERROR: [2] Hypre solve line 174 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[2]PETSC ERROR: [2] PCApply_HYPRE line 161 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[2]PETSC ERROR: [2] KSP_PCApply line 228 
/home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
[2]PETSC ERROR: [2] KSPInitialResidual line 44 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: [0] PetscDefaultFPTrap line 379 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
[0]PETSC ERROR: [0] Hypre solve line 174 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[0]PETSC ERROR: [0] PCApply_HYPRE line 161 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[0]PETSC ERROR: [0] KSP_PCApply line 228 
/home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
[0]PETSC ERROR: [0] KSPInitialResidual line 44 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: [0] KSPSolve_GMRES line 224 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
[1] KSP_PCApply line 228 
/home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
[1]PETSC ERROR: [1] KSPInitialResidual line 44 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
[1]PETSC ERROR: [1] KSPSolve_GMRES line 224 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
[2]PETSC ERROR: [2] KSPSolve_GMRES line 224 
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
[2]PETSC ERROR: [0]PETSC ERROR: User provided function() line 0 in 
Unknown file trapped floating point error
User provided function() line 0 in Unknown file trapped floating point error
[1]PETSC ERROR: User provided function() line 0 in Unknown file trapped 
floating point error

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x7FDC76F307D7
#0  0x7FA04C1207D7
#1  0x7FA04C120DDE
#1  0x7FDC76F30DDE
#2  0x7FA04B41ED3F
#3  0x7FA04B41ECC9
#2  0x7FDC7622ED3F
#4  0x7FA04B4220D7
#0  0x7F622A92F7D7
#3  0x7FDC7622ECC9
#5  0x7FA04C6BADCB
#1  0x7F622A92FDDE
#4  0x7FDC762320D7
#6  0x7FA04C6B5825
#2  0x7F6229C2DD3F
#7  0x7FA04C6BC17F
#5  0x7FDC774CADCB
#8  0x7FA04B41ED3F
#3  0x7F6229C2DCC9
#6  0x7FDC774C5825
#4  0x7F6229C310D7
#9  0x7FA04D9EF449
#7  0x7FDC774CC17F
#10  0x7FA04D9EF055
#5  0x7F622AEC9DCB
#8  0x7FDC7622ED3F
#11  0x7FA04D99D2DD
#6  0x7F622AEC4825
#9  0x7FDC787FF449
#12  0x7FA04D984ACD
#7  0x7F622AECB17F
#10  0x7FDC787FF055
#13  0x7FA04D973E63
#8  0x7F6229C2DD3F
#11  0x7FDC787AD2DD
#14  0x7FA04D27E8E3
#9  0x7F622C1FE449
#12  0x7FDC78794ACD
#15  0x7FA04D2BEB04
#10  0x7F622C1FE055
#13  0x7FDC78783E63
#16  0x7FA04D3CABFA
#11  0x7F622C1AC2DD
#17  0x7FA04D3CB927
#14  0x7FDC7808E8E3
#12  0x7F622C193ACD
#18  0x7FA04D361DE8
#15  0x7FDC780CEB04
#13  0x7F622C182E63
#16  0x7FDC781DABFA
#19  0x7FA04D3A0E1D
#20  0x7FA04D3DC121
#14  0x7F622BA8D8E3
#15  0x7F622BACDB04
#17  0x7FDC781DB927
#18  0x7FDC78171DE8
#16  0x7F622BBD9BFA
#19  0x7FDC781B0E1D
#17  0x7F622BBDA927
#20  0x7FDC781EC121
#18  0x7F622BB70DE8
#19  0x7F622BBAFE1D
#20  0x7F622BBEB121
#21  0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react at 
solver_ddmethod.F90:2850
#21  0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react at 
solver_ddmethod.F90:2850
#21  0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react at 
solver_ddmethod.F90:2850
#22  0x6A25A5 in reactran_ at reactran.F90:954
#22  0x6A25A5 in reactran_ at reactran.F90:954
#22  0x6A25A5 in reactran_ at reactran.F90:954
#23  0x574836 in timeloop_ at timeloop.F90:1194
#23  0x574836 in timeloop_ at timeloop.F90:1194
#23  0x574836 in timeloop_ at timeloop.F90:1194
#24  0x5ABFD7 in driver_pc at driver_pc.F90:599
#24  0x5ABFD7 in driver_pc at driver_pc.F90:599
#24  0x5ABFD7 in driver_pc at driver_pc.F90:599

On 15-04-24 11:12 AM, Barry Smith wrote:
>> On Apr 24, 2015, at 1:05 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>
>> Hi All,
>>
>> One of my case crashes because of floating point exception when using 4 processors, as shown below. But if I run this case with 1 processor, it works fine. I have tested the codes with around 100 cases up to 768 processors, all other cases work fine. I just wonder if this kind of error is caused because of NaN in jacobi matrix, RHS or preconditioner?
>     Yes, almost for sure it is one of these places.
>
>     First run the bad case with -fp_trap  if all goes well you'll see the function where the FPE is generated. Then run also with -start_in_debugger and
> type cont in all four debugger windows. When the FPE happens the debugger should stop showing exactly where the FPE happens.
>
>    Barry
>
>> I can check all the entries of jacobi matrix to see if the value is valid, but this seems not a good idea as it takes a long time to reach this point. If I restart the simulation from a specified time (e.g., 7.685 in this case), then the error does not occur.
>>
>> Would you please give me any suggestion on debugging this case?
>>
>> Thanks and Regards,
>>
>> Danyang
>>
>>
>> timestep:    2730 time: 7.665E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep:    2731 time: 7.675E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep:    2732 time: 7.685E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep:    2733 time: 7.695E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep:    2734 time: 7.705E+00 years   delt: 1.000E-02 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> Reduce time step for reactive transport
>> timestep:    2734 time: 7.700E+00 years   delt: 5.000E-03 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> Reduce time step for reactive transport
>> timestep:    2734 time: 7.697E+00 years   delt: 2.500E-03 years iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [1]PETSC ERROR: Floating point exception
>> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [2]PETSC ERROR: Floating point exception
>> [2]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3
>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>> [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3
>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>> [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23 15:38:52 2015
>> [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack
>> [1]PETSC ERROR: #1 VecValidValues() line 34 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>> ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23 15:38:52 2015
>> [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack
>> [2]PETSC ERROR: #1 VecValidValues() line 34 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>> [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>> [1]PETSC ERROR: #2 PCApply() line 442 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>> [2]PETSC ERROR: #3 KSP_PCApply() line 230 in /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>> #3 KSP_PCApply() line 230 in /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>> [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>> [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>> [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>> [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>> [2]PETSC ERROR: #6 KSPSolve() line 459 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>> [1]PETSC ERROR: #6 KSPSolve() line 459 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>> ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
>> [mpiexec at nwmop] Press Ctrl-C again to force abort



More information about the petsc-users mailing list