[petsc-users] Floating point exception
Danyang Su
danyang.su at gmail.com
Sat Apr 25 20:31:18 CDT 2015
On 15-04-25 06:26 PM, Matthew Knepley wrote:
> On Sat, Apr 25, 2015 at 8:23 PM, Danyang Su <danyang.su at gmail.com
> <mailto:danyang.su at gmail.com>> wrote:
>
>
>
> On 15-04-25 06:03 PM, Barry Smith wrote:
>
> If this is what you got in your last run
>
> at ../../gas_advection/velocity_g.F90:1344
> 1344 cinfrt =
> cinfrt_dg(i1) * diff(ic,idim) !diff is a very
> small value, e.g., 1.0d-316
>
> then it is still catching floating point underflow, which
> we do not want. This means either the change I suggested you
> make in the fp.c code didn't work or it actually uses a
> different floating point trap than that one. BTW: absurd
> numbers like 1.0d-316 are often a symptom of uninitialized
> data; could that be a problem that diff is not filled
> correctly for all the ic, idim you are using?
>
> This going round and round is very frustrating and a waste
> of time. You need to be more proactive yourself and explore
> the code and poke around to figure out how to solve the problem.
>
> Please email
> $PETSC_DIR/$PETSC_ARCH/include/petscvariables.h so I can see
> what FP trap is being used on your machine.
>
> Barry
>
> Do you mean $PETSC_DIR/$PETSC_ARCH/conf/petscvariables? Otherwise
> I cannot find this file.
>
>
> Its include/petscconf.h
>
> Do I need to reconfigure PETSc after changing the code you mentioned?
>
>
> No, but you need to rebuild.
Yes, I have done 'gnumake'.
>
> Matt
>
> Danyang
>
>
>
>
> On Apr 25, 2015, at 2:24 PM, Danyang Su
> <danyang.su at gmail.com <mailto:danyang.su at gmail.com>> wrote:
>
>
>
> On 15-04-25 11:55 AM, Barry Smith wrote:
>
> On Apr 25, 2015, at 1:51 PM, Danyang Su
> <danyang.su at gmail.com <mailto:danyang.su at gmail.com>>
> wrote:
>
>
>
> On 15-04-25 11:32 AM, Barry Smith wrote:
>
> I told you this yesterday.
>
> It is probably stopping here on a harmless
> underflow. You need to edit the PETSc code to
> not worry about underflow.
>
> Edit the file
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
> and locate
>
> #elif defined PETSC_HAVE_XMMINTRIN_H
> _MM_SET_EXCEPTION_MASK(_MM_MASK_INEXACT);
> #else
>
> change it to
>
> #elif defined PETSC_HAVE_XMMINTRIN_H
> _MM_SET_EXCEPTION_MASK(_MM_MASK_INEXACT |
> _MM_MASK_UNDERFLOW);
> #else
>
> Then run make gnumake in the PETSc directory
> to compile the new version. Now link and run
> the program again with -fp_trap and see where
> it gets stuck this time.
>
> Did you do this?
>
> Barry
>
> Yes, I did change the code in fp.c and run 'make
> gnumake' in the PETSc directory. I just did a
> double check and ran make gnumake again and got
> the following information this time.
>
>
> dsu at nwmop:~/Soft/PETSc/petsc-3.5.2$
> make gnumake
> Building PETSc using GNU Make with 10 build threads
> ==========================================
> make[1]: Entering directory
> `/home/dsu/Soft/PETSc/petsc-3.5.2'
> make[1]: Nothing to be done for `all'.
> make[1]: Leaving directory
> `/home/dsu/Soft/PETSc/petsc-3.5.2'
> =========================================
>
>
> Then I recompiled the codes, ran with -fp_trap and
> still got the following error
>
> Backtrace for this error:
> Note: The EXACT line numbers in the stack are not
> available,
> [2]PETSC ERROR: INSTEAD the line number of
> the start of the function
> [2]PETSC ERROR: is given.
> [2]PETSC ERROR: [2] PetscDefaultFPTrap line 379
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
> INSTEAD the line number of the start of the
> function
> [3]PETSC ERROR: is given.
> [3]PETSC ERROR: [3] PetscDefaultFPTrap line 379
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
> [2]PETSC ERROR: User provided function() line 0 in
> Unknown file trapped floating point error
> [3]PETSC ERROR: User provided function() line 0 in
> Unknown file trapped floating point error
>
>
> This is different then what you sent a few minutes
> ago where it crashed in hypre.
>
> Anyways you need to use the -start_in_debugger
> business I sent in the previous email to see the exact
> place the problem occurs.
>
> Here is the information shown on gdb screen
>
> Program received signal SIGFPE, Arithmetic exception.
> 0x00000000006c2bef in velocity_g (l_sufx=1, suffix=...,
> nmax=12, njamxc=34,
> cinfradx=..., radial_coordx=.FALSE., _suffix=3)
> at ../../gas_advection/velocity_g.F90:1344
> 1344 cinfrt =
> cinfrt_dg(i1) * diff(ic,idim) !diff is a very
> small value, e.g., 1.0d-316
> (gdb)
>
> After type cont on gdb screen, I got error information as
> below
>
> [1]PETSC ERROR: *** unknown floating point error occurred ***
> [1]PETSC ERROR: The specific exception can be determined
> by running in a debugger. When the
> [1]PETSC ERROR: debugger traps the signal, the exception
> can be found with fetestexcept(0x3d)
> [1]PETSC ERROR: where the result is a bitwise OR of the
> following flags:
> [1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
> [1]PETSC ERROR: Try option -start_in_debugger
> [1]PETSC ERROR: likely location of problem given in stack
> below
> [1]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack
> are not available,
> [1]PETSC ERROR: INSTEAD the line number of the start
> of the function
> [1]PETSC ERROR: is given.
> [1]PETSC ERROR: [1] PetscDefaultFPTrap line 379
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
> [1]PETSC ERROR: User provided function() line 0 in Unknown
> file trapped floating point error
> [0]PETSC ERROR: *** unknown floating point error occurred ***
> [0]PETSC ERROR: The specific exception can be determined
> by running in a debugger. When the
> [0]PETSC ERROR: debugger traps the signal, the exception
> can be found with fetestexcept(0x3d)
> [0]PETSC ERROR: where the result is a bitwise OR of the
> following flags:
> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
> [0]PETSC ERROR: Try option -start_in_debugger
> [0]PETSC ERROR: likely location of problem given in stack
> below
>
> Thanks,
>
> Danyang
>
> Thanks,
>
> Danyang
>
> On Apr 25, 2015, at 1:05 AM, Danyang Su
> <danyang.su at gmail.com
> <mailto:danyang.su at gmail.com>>
> wrote:
>
> Hi Barry and Satish,
>
> How can I get rid of unknown floating
> point error when a very small value is
> multiplied.
>
> e.g.,
> cinfrt_dg(i1) and diff(ic,idim) are
> 1.0250235986806329E-008
> 8.6178408169776945E-317 respectively,
>
> cinfrt = cinfrt_dg(i1) * diff(ic,idim)
>
> I get the following error when run with
> "-fp_trap -start_in_debugger".
>
> Backtrace for this error:
> *** unknown floating point error occurred ***
> [2]PETSC ERROR: The specific exception can
> be determined by running in a debugger.
> When the
> [2]PETSC ERROR: debugger traps the signal,
> the exception can be found with
> fetestexcept(0x3d)
> [2]PETSC ERROR:
> cinfrt_dg(i1),diff(ic,idim)
> 1.0250235986806329E-008
> 8.6178408169776945E-317
> where the result is a bitwise OR of the
> following flags:
> [2]PETSC ERROR: FE_INVALID=0x1
> FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
> [2]PETSC ERROR: Try option -start_in_debugger
> [2]PETSC ERROR: likely location of problem
> given in stack below
>
> Thanks,
>
> Danyang
>
> On 15-04-24 01:54 PM, Danyang Su wrote:
>
> On 15-04-24 01:23 PM, Satish Balay wrote:
>
> c 4
> 1.0976214263087059E-067
>
> I don't think this number can be
> stored in a real*4.
>
> Satish
>
> Thanks, Satish. It is caused by this
> number.
>
> On Fri, 24 Apr 2015, Danyang Su wrote:
>
>
> On 15-04-24 11:12 AM, Barry
> Smith wrote:
>
> On Apr 24, 2015, at
> 1:05 PM, Danyang Su
> <danyang.su at gmail.com
> <mailto:danyang.su at gmail.com>>
> wrote:
>
> Hi All,
>
> One of my case crashes
> because of floating
> point exception when
> using 4
> processors, as shown
> below. But if I run
> this case with 1
> processor, it
> works fine. I have
> tested the codes with
> around 100 cases up to 768
> processors, all other
> cases work fine. I
> just wonder if this
> kind of error
> is caused because of
> NaN in jacobi matrix,
> RHS or preconditioner?
>
> Yes, almost for sure
> it is one of these places.
>
> First run the bad
> case with -fp_trap if all
> goes well you'll see the
> function where the FPE is
> generated. Then run also
> with -start_in_debugger
> and
> type cont in all four
> debugger windows. When the
> FPE happens the debugger
> should stop showing
> exactly where the FPE happens.
>
> Barry
>
> Hi Barry,
>
> If run with -fp_trap
> -start_in_debugger, I got the
> following error
>
> [0]PETSC ERROR: *** unknown
> floating point error occurred ***
> [0]PETSC ERROR: The specific
> exception can be determined by
> running in a
> debugger. When the
> [0]PETSC ERROR: debugger traps
> the signal, the exception can
> be found with
> fetestexcept(0x3d)
> [0]PETSC ERROR: where the
> result is a bitwise OR of the
> following flags:
> [0]PETSC ERROR: FE_INVALID=0x1
> FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
> [0]PETSC ERROR: Try option
> -start_in_debugger
> [0]PETSC ERROR: likely
> location of problem given in
> stack below
> [0]PETSC ERROR:
> --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The
> EXACT line numbers in the
> stack are not available,
> [0]PETSC ERROR: INSTEAD
> the line number of the start
> of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: [0]
> PetscDefaultFPTrap line 379
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
> [0]PETSC ERROR: User provided
> function() line 0 in Unknown
> file trapped
> floating point error
>
> Program received signal
> SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0 0x7F4FEAB1C7D7
> #1 0x7F4FEAB1CDDE
> #2 0x7F4FE9E1AD3F
> #3 0x7F4FE9E1ACC9
> #4 0x7F4FE9E1E0D7
> #5 0x7F4FEB0B6DCB
> #6 0x7F4FEB0B1825
> #7 0x7F4FEB0B817F
> #8 0x7F4FE9E1AD3F
> #9 0x6972C8 in tprfrtlc_ at
> tprfrtlc.F90:2393
> (discriminator 3)
> #10 0x4C6C87 in gcreact_ at
> gcreact.F90:678
> #11 0x707E19 in initicrt_ at
> initicrt.F90:589
> #12 0x4F42D0 in initprob_ at
> initprob.F90:430
> #13 0x5AAF72 in driver_pc at
> driver_pc.F90:438
>
> I checked the code at
> tprfrtlc.F90:2393,
>
>
> realbuffer_gb(1:nvars) =
> (/time,(c(ic),ic=1,nc-1), &
> (cx(ix),ix=1,nxout)/)
>
> All the values (time, c, cx)
> are reasonable, as shown
> below. The only
> possibility is that
> realbuffer_gb is in declared
> as real*4 if using sing
> precision output while time,
> c, cx are declared in real*8.
> I have a lot of
> similar data conversion from
> real*8 to real*4 output, other
> code does not
> return error.
>
> time 0.0000000000000000
> c 1
> 9.9999999999999995E-008
> c 2
> 3.1555251077549618E-003
> c 3
> 7.1657814842179362E-008
> c 4
> 1.0976214263087059E-067
> c 5
> 5.2879822292305797E-004
> c 6
> 9.9999999999999964E-005
> c 7
> 6.4055731968811337E-005
> c 8
> 3.4607572892578404E-020
> cx 1
> 3.4376650636008101E-005
> cx 2
> 7.3989678854017763E-012
> cx 3
> 9.5317170613607207E-012
> cx 4
> 2.2344525794718353E-015
> cx 5
> 3.0624685689695889E-008
> cx 6
> 1.0046157902783967E-007
> cx 7
> 1.5320169154914984E-004
> cx 8
> 8.6930292776346176E-014
> cx 9
> 3.5944267559348721E-005
> cx 10
> 3.0072645866951157E-018
> cx 11
> 2.3592486321095017E-013
>
> Thanks,
>
> Danyang
>
>
> I can check all the
> entries of jacobi
> matrix to see if the
> value is valid,
> but this seems not a
> good idea as it takes
> a long time to reach this
> point. If I restart
> the simulation from a
> specified time (e.g.,
> 7.685 in
> this case), then the
> error does not occur.
>
> Would you please give
> me any suggestion on
> debugging this case?
>
> Thanks and Regards,
>
> Danyang
>
>
> timestep: 2730
> time: 7.665E+00 years
> delt: 1.000E-02 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> timestep: 2731
> time: 7.675E+00 years
> delt: 1.000E-02 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> timestep: 2732
> time: 7.685E+00 years
> delt: 1.000E-02 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> timestep: 2733
> time: 7.695E+00 years
> delt: 1.000E-02 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> timestep: 2734
> time: 7.705E+00 years
> delt: 1.000E-02 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> Reduce time step for
> reactive transport
> timestep: 2734
> time: 7.700E+00 years
> delt: 5.000E-03 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> Reduce time step for
> reactive transport
> timestep: 2734
> time: 7.697E+00 years
> delt: 2.500E-03 years
> iter: 1
> timestep: max.sia:
> 0.000E+00 tol.sia:
> 0.000E+00
> [1]PETSC ERROR:
> ---------------------
> Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR:
> Floating point exception
> [2]PETSC ERROR:
> ---------------------
> Error Message
> --------------------------------------------------------------
> [2]PETSC ERROR:
> Floating point exception
> [2]PETSC ERROR: Vec
> entry at local
> location 0 is
> not-a-number or infinite
> at end of function:
> Parameter number 3
> [2]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> for trouble shooting.
> [2]PETSC ERROR: Petsc
> Release Version 3.5.2,
> Sep, 08, 2014
> [2]PETSC ERROR:
> [1]PETSC ERROR: Vec
> entry at local
> location 0 is
> not-a-number or
> infinite at end of
> function: Parameter
> number 3
> [1]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html
>
> for trouble shooting.
> [1]PETSC ERROR: Petsc
> Release Version 3.5.2,
> Sep, 08, 2014
> [1]PETSC ERROR:
> ../min3p_thcm_petsc_dbg on
> a linux-gnu-dbg named
> nwmop by
> dsu Thu Apr 23
> 15:38:52 2015
> [1]PETSC ERROR:
> Configure options
> PETSC_ARCH=linux-gnu-dbg
> --with-cc=gcc
> --with-cxx=g++
> --with-fc=gfortran
> --download-fblaslapack
> --download-mpich
> --download-mumps
> --download-hypre
> --download-superlu_dist --download-metis
> --download-parmetis
> --download-scalapack
> [1]PETSC ERROR: #1
> VecValidValues() line
> 34 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
> ../min3p_thcm_petsc_dbg on
> a linux-gnu-dbg named
> nwmop by dsu Thu Apr 23
> 15:38:52 2015
> [2]PETSC ERROR:
> Configure options
> PETSC_ARCH=linux-gnu-dbg
> --with-cc=gcc
> --with-cxx=g++
> --with-fc=gfortran
> --download-fblaslapack
> --download-mpich
> --download-mumps
> --download-hypre
> --download-superlu_dist --download-metis
> --download-parmetis
> --download-scalapack
> [2]PETSC ERROR: #1
> VecValidValues() line
> 34 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
> [2]PETSC ERROR:
> [1]PETSC ERROR: #2
> PCApply() line 442 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
> [1]PETSC ERROR: #2
> PCApply() line 442 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
> [2]PETSC ERROR: #3
> KSP_PCApply() line 230 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
> #3 KSP_PCApply() line
> 230 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
> [1]PETSC ERROR: #4
> KSPInitialResidual()
> line 63 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
> [2]PETSC ERROR: #4
> KSPInitialResidual()
> line 63 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
> [1]PETSC ERROR: #5
> KSPSolve_GMRES() line
> 234 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
> [2]PETSC ERROR: #5
> KSPSolve_GMRES() line
> 234 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
> [2]PETSC ERROR: #6
> KSPSolve() line 459 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: #6
> KSPSolve() line 459 in
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
> ^C[mpiexec at nwmop]
> Sending Ctrl-C to
> processes as requested
> [mpiexec at nwmop] Press
> Ctrl-C again to force
> abort
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150425/ebaec750/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petscconf.h
Type: text/x-chdr
Size: 16776 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150425/ebaec750/attachment-0001.h>
More information about the petsc-users
mailing list