[petsc-users] Floating point exception
Danyang Su
danyang.su at gmail.com
Sat Apr 25 01:05:13 CDT 2015
Hi Barry and Satish,
How can I get rid of unknown floating point error when a very small
value is multiplied.
e.g.,
cinfrt_dg(i1) and diff(ic,idim) are 1.0250235986806329E-008
8.6178408169776945E-317 respectively,
cinfrt = cinfrt_dg(i1) * diff(ic,idim)
I get the following error when run with "-fp_trap -start_in_debugger".
Backtrace for this error:
*** unknown floating point error occurred ***
[2]PETSC ERROR: The specific exception can be determined by running in a
debugger. When the
[2]PETSC ERROR: debugger traps the signal, the exception can be found
with fetestexcept(0x3d)
[2]PETSC ERROR: cinfrt_dg(i1),diff(ic,idim) 1.0250235986806329E-008
8.6178408169776945E-317
where the result is a bitwise OR of the following flags:
[2]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
FE_UNDERFLOW=0x10 FE_INEXACT=0x20
[2]PETSC ERROR: Try option -start_in_debugger
[2]PETSC ERROR: likely location of problem given in stack below
Thanks,
Danyang
On 15-04-24 01:54 PM, Danyang Su wrote:
> On 15-04-24 01:23 PM, Satish Balay wrote:
>> c 4 1.0976214263087059E-067
>>
>> I don't think this number can be stored in a real*4.
>>
>> Satish
> Thanks, Satish. It is caused by this number.
>>
>> On Fri, 24 Apr 2015, Danyang Su wrote:
>>
>>>
>>> On 15-04-24 11:12 AM, Barry Smith wrote:
>>>>> On Apr 24, 2015, at 1:05 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> One of my case crashes because of floating point exception when
>>>>> using 4
>>>>> processors, as shown below. But if I run this case with 1
>>>>> processor, it
>>>>> works fine. I have tested the codes with around 100 cases up to 768
>>>>> processors, all other cases work fine. I just wonder if this kind
>>>>> of error
>>>>> is caused because of NaN in jacobi matrix, RHS or preconditioner?
>>>> Yes, almost for sure it is one of these places.
>>>>
>>>> First run the bad case with -fp_trap if all goes well you'll
>>>> see the
>>>> function where the FPE is generated. Then run also with
>>>> -start_in_debugger
>>>> and
>>>> type cont in all four debugger windows. When the FPE happens the
>>>> debugger
>>>> should stop showing exactly where the FPE happens.
>>>>
>>>> Barry
>>> Hi Barry,
>>>
>>> If run with -fp_trap -start_in_debugger, I got the following error
>>>
>>> [0]PETSC ERROR: *** unknown floating point error occurred ***
>>> [0]PETSC ERROR: The specific exception can be determined by running
>>> in a
>>> debugger. When the
>>> [0]PETSC ERROR: debugger traps the signal, the exception can be
>>> found with
>>> fetestexcept(0x3d)
>>> [0]PETSC ERROR: where the result is a bitwise OR of the following
>>> flags:
>>> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
>>> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>> [0]PETSC ERROR: Try option -start_in_debugger
>>> [0]PETSC ERROR: likely location of problem given in stack below
>>> [0]PETSC ERROR: --------------------- Stack Frames
>>> ------------------------------------
>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>> available,
>>> [0]PETSC ERROR: INSTEAD the line number of the start of the
>>> function
>>> [0]PETSC ERROR: is given.
>>> [0]PETSC ERROR: [0] PetscDefaultFPTrap line 379
>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>> [0]PETSC ERROR: User provided function() line 0 in Unknown file trapped
>>> floating point error
>>>
>>> Program received signal SIGABRT: Process abort signal.
>>>
>>> Backtrace for this error:
>>> #0 0x7F4FEAB1C7D7
>>> #1 0x7F4FEAB1CDDE
>>> #2 0x7F4FE9E1AD3F
>>> #3 0x7F4FE9E1ACC9
>>> #4 0x7F4FE9E1E0D7
>>> #5 0x7F4FEB0B6DCB
>>> #6 0x7F4FEB0B1825
>>> #7 0x7F4FEB0B817F
>>> #8 0x7F4FE9E1AD3F
>>> #9 0x6972C8 in tprfrtlc_ at tprfrtlc.F90:2393 (discriminator 3)
>>> #10 0x4C6C87 in gcreact_ at gcreact.F90:678
>>> #11 0x707E19 in initicrt_ at initicrt.F90:589
>>> #12 0x4F42D0 in initprob_ at initprob.F90:430
>>> #13 0x5AAF72 in driver_pc at driver_pc.F90:438
>>>
>>> I checked the code at tprfrtlc.F90:2393,
>>>
>>> realbuffer_gb(1:nvars) = (/time,(c(ic),ic=1,nc-1), &
>>> (cx(ix),ix=1,nxout)/)
>>>
>>> All the values (time, c, cx) are reasonable, as shown below. The only
>>> possibility is that realbuffer_gb is in declared as real*4 if using
>>> sing
>>> precision output while time, c, cx are declared in real*8. I have a
>>> lot of
>>> similar data conversion from real*8 to real*4 output, other code
>>> does not
>>> return error.
>>>
>>> time 0.0000000000000000
>>> c 1 9.9999999999999995E-008
>>> c 2 3.1555251077549618E-003
>>> c 3 7.1657814842179362E-008
>>> c 4 1.0976214263087059E-067
>>> c 5 5.2879822292305797E-004
>>> c 6 9.9999999999999964E-005
>>> c 7 6.4055731968811337E-005
>>> c 8 3.4607572892578404E-020
>>> cx 1 3.4376650636008101E-005
>>> cx 2 7.3989678854017763E-012
>>> cx 3 9.5317170613607207E-012
>>> cx 4 2.2344525794718353E-015
>>> cx 5 3.0624685689695889E-008
>>> cx 6 1.0046157902783967E-007
>>> cx 7 1.5320169154914984E-004
>>> cx 8 8.6930292776346176E-014
>>> cx 9 3.5944267559348721E-005
>>> cx 10 3.0072645866951157E-018
>>> cx 11 2.3592486321095017E-013
>>>
>>> Thanks,
>>>
>>> Danyang
>>>
>>>>> I can check all the entries of jacobi matrix to see if the value
>>>>> is valid,
>>>>> but this seems not a good idea as it takes a long time to reach this
>>>>> point. If I restart the simulation from a specified time (e.g.,
>>>>> 7.685 in
>>>>> this case), then the error does not occur.
>>>>>
>>>>> Would you please give me any suggestion on debugging this case?
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Danyang
>>>>>
>>>>>
>>>>> timestep: 2730 time: 7.665E+00 years delt: 1.000E-02 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> timestep: 2731 time: 7.675E+00 years delt: 1.000E-02 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> timestep: 2732 time: 7.685E+00 years delt: 1.000E-02 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> timestep: 2733 time: 7.695E+00 years delt: 1.000E-02 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> timestep: 2734 time: 7.705E+00 years delt: 1.000E-02 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> Reduce time step for reactive transport
>>>>> timestep: 2734 time: 7.700E+00 years delt: 5.000E-03 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> Reduce time step for reactive transport
>>>>> timestep: 2734 time: 7.697E+00 years delt: 2.500E-03 years
>>>>> iter: 1
>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>> [1]PETSC ERROR: --------------------- Error Message
>>>>> --------------------------------------------------------------
>>>>> [1]PETSC ERROR: Floating point exception
>>>>> [2]PETSC ERROR: --------------------- Error Message
>>>>> --------------------------------------------------------------
>>>>> [2]PETSC ERROR: Floating point exception
>>>>> [2]PETSC ERROR: Vec entry at local location 0 is not-a-number or
>>>>> infinite
>>>>> at end of function: Parameter number 3
>>>>> [2]PETSC ERROR: See
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>> for trouble shooting.
>>>>> [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>> [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location 0 is
>>>>> not-a-number or infinite at end of function: Parameter number 3
>>>>> [1]PETSC ERROR: See
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>> for trouble shooting.
>>>>> [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>> [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named
>>>>> nwmop by
>>>>> dsu Thu Apr 23 15:38:52 2015
>>>>> [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>> --with-cc=gcc
>>>>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>> --download-mpich
>>>>> --download-mumps --download-hypre --download-superlu_dist
>>>>> --download-metis
>>>>> --download-parmetis --download-scalapack
>>>>> [1]PETSC ERROR: #1 VecValidValues() line 34 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>>>>> ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu
>>>>> Apr 23
>>>>> 15:38:52 2015
>>>>> [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>> --with-cc=gcc
>>>>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>> --download-mpich
>>>>> --download-mumps --download-hypre --download-superlu_dist
>>>>> --download-metis
>>>>> --download-parmetis --download-scalapack
>>>>> [2]PETSC ERROR: #1 VecValidValues() line 34 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>>>>> [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>>>>> [1]PETSC ERROR: #2 PCApply() line 442 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>>>>> [2]PETSC ERROR: #3 KSP_PCApply() line 230 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>>>>> #3 KSP_PCApply() line 230 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>>>>> [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>>>>> [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>>>>> [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>>>>> [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>>>>> [2]PETSC ERROR: #6 KSPSolve() line 459 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>>>>> [1]PETSC ERROR: #6 KSPSolve() line 459 in
>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>>>>> ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
>>>>> [mpiexec at nwmop] Press Ctrl-C again to force abort
>>>
>>>
>
More information about the petsc-users
mailing list