[petsc-users] Floating point exception
Danyang Su
danyang.su at gmail.com
Sat Apr 25 13:13:05 CDT 2015
Hi Barry,
With -fp_trap and -start_in_debugger options, the code crashed with the
following error.
The code at #21 0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react
at solver_ddmethod.F90:2850 is "call
KSPSolve(ksp_react,b_react,x_react,ierr)"
I run this case with 4 processors and the preconditioner type is HYPRE.
Does this mean something wrong in Matrix ksp_react or RHS b_react?
Thanks,
Danyang
timestep: 1846 time: 3.392E+00 years delt: 1.000E-02 years iter:
1 max.sia: 0.000E+00 tol.sia: 0.000E+00
Reduce time step for reactive transport
timestep: 1846 time: 3.387E+00 years delt: 5.000E-03 years iter:
1 max.sia: 0.000E+00 tol.sia: 0.000E+00
Reduce time step for reactive transport
timestep: 1846 time: 3.385E+00 years delt: 2.500E-03 years iter:
1 max.sia: 0.000E+00 tol.sia: 0.000E+00
[0]PETSC ERROR: *** unknown floating point error occurred ***
[0]PETSC ERROR: The specific exception can be determined by running in a
debugger. When the
[0]PETSC ERROR: debugger traps the signal, the exception can be found
with fetestexcept(0x3d)
[0]PETSC ERROR: where the result is a bitwise OR of the following flags:
[0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[0]PETSC ERROR: Try option -start_in_debugger
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[1]PETSC ERROR: *** unknown floating point error occurred ***
[1]PETSC ERROR: The specific exception can be determined by running in a
debugger. When the
[1]PETSC ERROR: debugger traps the signal, the exception can be found
with fetestexcept(0x3d)
[1]PETSC ERROR: where the result is a bitwise OR of the following flags:
[1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[1]PETSC ERROR: Try option -start_in_debugger
[1]PETSC ERROR: likely location of problem given in stack below
[1]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[1]PETSC ERROR: INSTEAD the line number of the start of the function
[1]PETSC ERROR: is given.
[1]PETSC ERROR: [1] PetscDefaultFPTrap line 379
/home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
[1]PETSC ERROR: [1] Hypre solve line 174
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[1]PETSC ERROR: [1] PCApply_HYPRE line 161
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[1]PETSC ERROR: [2]PETSC ERROR: *** unknown floating point error
occurred ***
[2]PETSC ERROR: The specific exception can be determined by running in a
debugger. When the
[2]PETSC ERROR: debugger traps the signal, the exception can be found
with fetestexcept(0x3d)
[2]PETSC ERROR: where the result is a bitwise OR of the following flags:
[2]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[2]PETSC ERROR: Try option -start_in_debugger
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[2]PETSC ERROR: INSTEAD the line number of the start of the function
[2]PETSC ERROR: is given.
[2]PETSC ERROR: [2] PetscDefaultFPTrap line 379
/home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
[2]PETSC ERROR: [2] Hypre solve line 174
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[2]PETSC ERROR: [2] PCApply_HYPRE line 161
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[2]PETSC ERROR: [2] KSP_PCApply line 228
/home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
[2]PETSC ERROR: [2] KSPInitialResidual line 44
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: [0] PetscDefaultFPTrap line 379
/home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
[0]PETSC ERROR: [0] Hypre solve line 174
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[0]PETSC ERROR: [0] PCApply_HYPRE line 161
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/impls/hypre/hypre.c
[0]PETSC ERROR: [0] KSP_PCApply line 228
/home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
[0]PETSC ERROR: [0] KSPInitialResidual line 44
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: [0] KSPSolve_GMRES line 224
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
[1] KSP_PCApply line 228
/home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
[1]PETSC ERROR: [1] KSPInitialResidual line 44
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
[1]PETSC ERROR: [1] KSPSolve_GMRES line 224
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
[2]PETSC ERROR: [2] KSPSolve_GMRES line 224
/home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
[2]PETSC ERROR: [0]PETSC ERROR: User provided function() line 0 in
Unknown file trapped floating point error
User provided function() line 0 in Unknown file trapped floating point error
[1]PETSC ERROR: User provided function() line 0 in Unknown file trapped
floating point error
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7FDC76F307D7
#0 0x7FA04C1207D7
#1 0x7FA04C120DDE
#1 0x7FDC76F30DDE
#2 0x7FA04B41ED3F
#3 0x7FA04B41ECC9
#2 0x7FDC7622ED3F
#4 0x7FA04B4220D7
#0 0x7F622A92F7D7
#3 0x7FDC7622ECC9
#5 0x7FA04C6BADCB
#1 0x7F622A92FDDE
#4 0x7FDC762320D7
#6 0x7FA04C6B5825
#2 0x7F6229C2DD3F
#7 0x7FA04C6BC17F
#5 0x7FDC774CADCB
#8 0x7FA04B41ED3F
#3 0x7F6229C2DCC9
#6 0x7FDC774C5825
#4 0x7F6229C310D7
#9 0x7FA04D9EF449
#7 0x7FDC774CC17F
#10 0x7FA04D9EF055
#5 0x7F622AEC9DCB
#8 0x7FDC7622ED3F
#11 0x7FA04D99D2DD
#6 0x7F622AEC4825
#9 0x7FDC787FF449
#12 0x7FA04D984ACD
#7 0x7F622AECB17F
#10 0x7FDC787FF055
#13 0x7FA04D973E63
#8 0x7F6229C2DD3F
#11 0x7FDC787AD2DD
#14 0x7FA04D27E8E3
#9 0x7F622C1FE449
#12 0x7FDC78794ACD
#15 0x7FA04D2BEB04
#10 0x7F622C1FE055
#13 0x7FDC78783E63
#16 0x7FA04D3CABFA
#11 0x7F622C1AC2DD
#17 0x7FA04D3CB927
#14 0x7FDC7808E8E3
#12 0x7F622C193ACD
#18 0x7FA04D361DE8
#15 0x7FDC780CEB04
#13 0x7F622C182E63
#16 0x7FDC781DABFA
#19 0x7FA04D3A0E1D
#20 0x7FA04D3DC121
#14 0x7F622BA8D8E3
#15 0x7F622BACDB04
#17 0x7FDC781DB927
#18 0x7FDC78171DE8
#16 0x7F622BBD9BFA
#19 0x7FDC781B0E1D
#17 0x7F622BBDA927
#20 0x7FDC781EC121
#18 0x7F622BB70DE8
#19 0x7F622BBAFE1D
#20 0x7F622BBEB121
#21 0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react at
solver_ddmethod.F90:2850
#21 0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react at
solver_ddmethod.F90:2850
#21 0x41C49A in __solver_dd_MOD_solver_dd_snes_solve_react at
solver_ddmethod.F90:2850
#22 0x6A25A5 in reactran_ at reactran.F90:954
#22 0x6A25A5 in reactran_ at reactran.F90:954
#22 0x6A25A5 in reactran_ at reactran.F90:954
#23 0x574836 in timeloop_ at timeloop.F90:1194
#23 0x574836 in timeloop_ at timeloop.F90:1194
#23 0x574836 in timeloop_ at timeloop.F90:1194
#24 0x5ABFD7 in driver_pc at driver_pc.F90:599
#24 0x5ABFD7 in driver_pc at driver_pc.F90:599
#24 0x5ABFD7 in driver_pc at driver_pc.F90:599
On 15-04-24 11:12 AM, Barry Smith wrote:
>> On Apr 24, 2015, at 1:05 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>
>> Hi All,
>>
>> One of my case crashes because of floating point exception when using 4 processors, as shown below. But if I run this case with 1 processor, it works fine. I have tested the codes with around 100 cases up to 768 processors, all other cases work fine. I just wonder if this kind of error is caused because of NaN in jacobi matrix, RHS or preconditioner?
> Yes, almost for sure it is one of these places.
>
> First run the bad case with -fp_trap if all goes well you'll see the function where the FPE is generated. Then run also with -start_in_debugger and
> type cont in all four debugger windows. When the FPE happens the debugger should stop showing exactly where the FPE happens.
>
> Barry
>
>> I can check all the entries of jacobi matrix to see if the value is valid, but this seems not a good idea as it takes a long time to reach this point. If I restart the simulation from a specified time (e.g., 7.685 in this case), then the error does not occur.
>>
>> Would you please give me any suggestion on debugging this case?
>>
>> Thanks and Regards,
>>
>> Danyang
>>
>>
>> timestep: 2730 time: 7.665E+00 years delt: 1.000E-02 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep: 2731 time: 7.675E+00 years delt: 1.000E-02 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep: 2732 time: 7.685E+00 years delt: 1.000E-02 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep: 2733 time: 7.695E+00 years delt: 1.000E-02 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> timestep: 2734 time: 7.705E+00 years delt: 1.000E-02 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> Reduce time step for reactive transport
>> timestep: 2734 time: 7.700E+00 years delt: 5.000E-03 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> Reduce time step for reactive transport
>> timestep: 2734 time: 7.697E+00 years delt: 2.500E-03 years iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [1]PETSC ERROR: Floating point exception
>> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [2]PETSC ERROR: Floating point exception
>> [2]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3
>> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>> [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite at end of function: Parameter number 3
>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>> [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23 15:38:52 2015
>> [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack
>> [1]PETSC ERROR: #1 VecValidValues() line 34 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>> ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23 15:38:52 2015
>> [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-mumps --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack
>> [2]PETSC ERROR: #1 VecValidValues() line 34 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>> [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>> [1]PETSC ERROR: #2 PCApply() line 442 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>> [2]PETSC ERROR: #3 KSP_PCApply() line 230 in /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>> #3 KSP_PCApply() line 230 in /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>> [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>> [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>> [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>> [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>> [2]PETSC ERROR: #6 KSPSolve() line 459 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>> [1]PETSC ERROR: #6 KSPSolve() line 459 in /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>> ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
>> [mpiexec at nwmop] Press Ctrl-C again to force abort
More information about the petsc-users
mailing list