[petsc-users] Floating point exception

Satish Balay balay at mcs.anl.gov
Fri Apr 24 15:23:18 CDT 2015


 c           4   1.0976214263087059E-067

I don't think this number can be stored in a real*4.

Satish

On Fri, 24 Apr 2015, Danyang Su wrote:

> 
> 
> On 15-04-24 11:12 AM, Barry Smith wrote:
> > > On Apr 24, 2015, at 1:05 PM, Danyang Su <danyang.su at gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > One of my case crashes because of floating point exception when using 4
> > > processors, as shown below. But if I run this case with 1 processor, it
> > > works fine. I have tested the codes with around 100 cases up to 768
> > > processors, all other cases work fine. I just wonder if this kind of error
> > > is caused because of NaN in jacobi matrix, RHS or preconditioner?
> >     Yes, almost for sure it is one of these places.
> >
> >     First run the bad case with -fp_trap  if all goes well you'll see the
> > function where the FPE is generated. Then run also with -start_in_debugger
> > and
> > type cont in all four debugger windows. When the FPE happens the debugger
> > should stop showing exactly where the FPE happens.
> >
> >    Barry
> Hi Barry,
> 
> If run with -fp_trap -start_in_debugger, I got the following error
> 
> [0]PETSC ERROR: *** unknown floating point error occurred ***
> [0]PETSC ERROR: The specific exception can be determined by running in a
> debugger.  When the
> [0]PETSC ERROR: debugger traps the signal, the exception can be found with
> fetestexcept(0x3d)
> [0]PETSC ERROR: where the result is a bitwise OR of the following flags:
> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
> [0]PETSC ERROR: Try option -start_in_debugger
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] PetscDefaultFPTrap line 379
> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
> [0]PETSC ERROR: User provided function() line 0 in Unknown file trapped
> floating point error
> 
> Program received signal SIGABRT: Process abort signal.
> 
> Backtrace for this error:
> #0  0x7F4FEAB1C7D7
> #1  0x7F4FEAB1CDDE
> #2  0x7F4FE9E1AD3F
> #3  0x7F4FE9E1ACC9
> #4  0x7F4FE9E1E0D7
> #5  0x7F4FEB0B6DCB
> #6  0x7F4FEB0B1825
> #7  0x7F4FEB0B817F
> #8  0x7F4FE9E1AD3F
> #9  0x6972C8 in tprfrtlc_ at tprfrtlc.F90:2393 (discriminator 3)
> #10  0x4C6C87 in gcreact_ at gcreact.F90:678
> #11  0x707E19 in initicrt_ at initicrt.F90:589
> #12  0x4F42D0 in initprob_ at initprob.F90:430
> #13  0x5AAF72 in driver_pc at driver_pc.F90:438
> 
> I checked the code at  tprfrtlc.F90:2393,
> 
>         realbuffer_gb(1:nvars) = (/time,(c(ic),ic=1,nc-1),     &
>                                    (cx(ix),ix=1,nxout)/)
> 
> All the values (time, c, cx) are reasonable, as shown below. The only
> possibility is that realbuffer_gb is in declared as real*4 if using sing
> precision output while time, c, cx are declared in real*8. I have a lot of
> similar data conversion from real*8 to real*4 output, other code does not
> return error.
> 
>  time   0.0000000000000000
>  c           1   9.9999999999999995E-008
>  c           2   3.1555251077549618E-003
>  c           3   7.1657814842179362E-008
>  c           4   1.0976214263087059E-067
>  c           5   5.2879822292305797E-004
>  c           6   9.9999999999999964E-005
>  c           7   6.4055731968811337E-005
>  c           8   3.4607572892578404E-020
>  cx           1   3.4376650636008101E-005
>  cx           2   7.3989678854017763E-012
>  cx           3   9.5317170613607207E-012
>  cx           4   2.2344525794718353E-015
>  cx           5   3.0624685689695889E-008
>  cx           6   1.0046157902783967E-007
>  cx           7   1.5320169154914984E-004
>  cx           8   8.6930292776346176E-014
>  cx           9   3.5944267559348721E-005
>  cx          10   3.0072645866951157E-018
>  cx          11   2.3592486321095017E-013
> 
> Thanks,
> 
> Danyang
> 
> >
> > > I can check all the entries of jacobi matrix to see if the value is valid,
> > > but this seems not a good idea as it takes a long time to reach this
> > > point. If I restart the simulation from a specified time (e.g., 7.685 in
> > > this case), then the error does not occur.
> > >
> > > Would you please give me any suggestion on debugging this case?
> > >
> > > Thanks and Regards,
> > >
> > > Danyang
> > >
> > >
> > > timestep:    2730 time: 7.665E+00 years   delt: 1.000E-02 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > timestep:    2731 time: 7.675E+00 years   delt: 1.000E-02 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > timestep:    2732 time: 7.685E+00 years   delt: 1.000E-02 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > timestep:    2733 time: 7.695E+00 years   delt: 1.000E-02 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > timestep:    2734 time: 7.705E+00 years   delt: 1.000E-02 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > Reduce time step for reactive transport
> > > timestep:    2734 time: 7.700E+00 years   delt: 5.000E-03 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > Reduce time step for reactive transport
> > > timestep:    2734 time: 7.697E+00 years   delt: 2.500E-03 years iter:  1
> > > timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
> > > [1]PETSC ERROR: --------------------- Error Message
> > > --------------------------------------------------------------
> > > [1]PETSC ERROR: Floating point exception
> > > [2]PETSC ERROR: --------------------- Error Message
> > > --------------------------------------------------------------
> > > [2]PETSC ERROR: Floating point exception
> > > [2]PETSC ERROR: Vec entry at local location 0 is not-a-number or infinite
> > > at end of function: Parameter number 3
> > > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> > > for trouble shooting.
> > > [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> > > [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location 0 is
> > > not-a-number or infinite at end of function: Parameter number 3
> > > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> > > for trouble shooting.
> > > [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> > > [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by
> > > dsu Thu Apr 23 15:38:52 2015
> > > [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc
> > > --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich
> > > --download-mumps --download-hypre --download-superlu_dist --download-metis
> > > --download-parmetis --download-scalapack
> > > [1]PETSC ERROR: #1 VecValidValues() line 34 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
> > > ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by dsu Thu Apr 23
> > > 15:38:52 2015
> > > [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg --with-cc=gcc
> > > --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich
> > > --download-mumps --download-hypre --download-superlu_dist --download-metis
> > > --download-parmetis --download-scalapack
> > > [2]PETSC ERROR: #1 VecValidValues() line 34 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
> > > [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
> > > [1]PETSC ERROR: #2 PCApply() line 442 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
> > > [2]PETSC ERROR: #3 KSP_PCApply() line 230 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
> > > #3 KSP_PCApply() line 230 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
> > > [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
> > > [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
> > > [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
> > > [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
> > > [2]PETSC ERROR: #6 KSPSolve() line 459 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
> > > [1]PETSC ERROR: #6 KSPSolve() line 459 in
> > > /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
> > > ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
> > > [mpiexec at nwmop] Press Ctrl-C again to force abort
> 
> 
> 



More information about the petsc-users mailing list