[petsc-users] Floating point exception in hypre BoomerAMG

Wed Apr 29 14:39:19 CDT 2015

On Thu, Apr 30, 2015 at 5:35 AM, Danyang Su <danyang.su at gmail.com> wrote:

> On 15-04-29 12:19 PM, Barry Smith wrote:
>
>>    Ok, your code seems fine in terms of logic.  But the vector x_vec_gbl
>> i.e. b_react  should not be in a read only state. Are you somewhere calling
>> a VecGetArray() and not calling the VecRestoreArray()?
>>
>>     Barry
>>
> I just made a triple check on VecGetArrayF90(). All the VecGetArrayF90()
> come with VecRestoreArrayF90(). No VecGetArray() is used in my codes.
>

So you can check the lock status of the Vec:

http://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/Vec/VecLockGet.html

Put these in your code until you can see who is locking the Vec before the
call.

  Thanks,

     Matt

> Danyang
>
>>
>>
>>
>>  On Apr 29, 2015, at 1:52 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>>
>>> On 15-04-29 11:30 AM, Barry Smith wrote:
>>>
>>>> On Apr 29, 2015, at 12:15 PM, Danyang Su <danyang.su at gmail.com>
>>>>>   wrote:
>>>>>
>>>>> On 15-04-28 06:50 PM, Barry Smith wrote:
>>>>>
>>>>>     We started enforcing more checks on writing to vectors that you
>>>>>> should not write to.  Where are you calling DMLocalToGlobalBegin() ? It
>>>>>> looks like you are trying to copy values into a global array that you
>>>>>> should not because it is a read only input to a function, for example it is
>>>>>> the right hand side of a linear system or the input into a
>>>>>> SNESFormFunction(). So check the output to that call.
>>>>>>
>>>>>>    Barry
>>>>>>
>>>>>>
>>>>>>  This is used when local RHS vec is calculated and copied to global
>>>>> RHS.
>>>>>
>>>>>      Huhh, why are you copying anything into the RHS? Is this before
>>>> you call the linear system solve? Send the code that calls the
>>>> DMLocalToGlobalBegin()
>>>>
>>> Yes. This is called before the linear system solver. At present, we use
>>> PETSc KSP solver.
>>> The code is first developed in totally sequential version and then
>>> ported to the parallel version. So the subdomain has its own space for
>>> jacobi matrix and rhs. We do not use global RHS vector directly to set the
>>> RHS at this moment. All the RHS values of subdomain are copied to the
>>> global RHS.
>>>
>>>        !> This is the function where  RHS values of subdomain are copied
>>> to the global RHS values.
>>>        subroutine compute_function(rank,da,x_array_loc,x_vec_loc,       &
>>>                        x_vec_gbl,nngl,row_idx_l2pg,col_idx_l2pg,        &
>>>                        b_non_interlaced)
>>>                       implicit none
>>> #include <finclude/petscsys.h>
>>> #include <finclude/petscis.h>
>>> #include <finclude/petscvec.h>
>>> #include <finclude/petscvec.h90>
>>> #include <finclude/petscmat.h>
>>> #include <finclude/petscpc.h>
>>> #include <finclude/petscksp.h>
>>> #include <finclude/petscdmda.h>
>>> #include <finclude/petscdmda.h90>
>>>
>>>            PetscInt :: rank
>>>            DM :: da
>>>            PetscReal, allocatable :: x_array_loc(:)
>>>            Vec :: x_vec_loc
>>>            Vec :: x_vec_gbl
>>>            PetscInt :: nngl
>>>            PetscInt, allocatable :: row_idx_l2pg(:)
>>>            PetscInt, allocatable :: col_idx_l2pg(:)
>>>            PetscBool :: b_non_interlaced
>>>
>>>                       PetscInt :: info_debug
>>>            PetscInt :: i, j
>>>            PetscErrorCode :: ierr
>>>            PetscScalar, pointer :: vecpointer(:)
>>>                                      !Zero entries
>>>            call VecZeroEntries(x_vec_loc, ierr)
>>>                       !Get a pointer to vector data when you need access
>>> to the array
>>>            call VecGetArrayF90(x_vec_loc, vecpointer, ierr)
>>>                       !Compute the function over the locally owned part
>>> of the grid
>>>            if(b_non_interlaced) then
>>>              j = nngl/2
>>>              do i = 1, j
>>>                vecpointer(2*i-1) = x_array_loc(i)
>>>                vecpointer(2*i) = x_array_loc(i+j)
>>>              end do
>>>            else
>>>              do i = 1, nngl
>>>                vecpointer(i) = x_array_loc(i)
>>>              end do
>>>            end if
>>>                       !Restore the vector when you no longer need access
>>> to the array
>>>            call VecRestoreArrayF90(x_vec_loc,vecpointer,ierr)
>>>                       !Insert values into global vector
>>>            call DMLocalToGlobalBegin(da,x_vec_loc,INSERT_VALUES,      &
>>>                                      x_vec_gbl,ierr)
>>>            !By placing code between these two statements, computations
>>> can be
>>>            !done while messages are in transition.
>>>            call DMLocalToGlobalEnd(da,x_vec_loc,INSERT_VALUES,        &
>>>                                    x_vec_gbl,ierr)
>>>                      return
>>>               end subroutine
>>>
>>>
>>>
>>>      !> This is the function where reactive transport equations are
>>> solved.
>>>      subroutine solver_dd_snes_solve_react(ilog,idetail,a_in,b_in,      &
>>>                        x_inout,ia_in,ja_in,nngl_in,itsolv,              &
>>>                        over_flow,rnorm,row_idx_l2pg,col_idx_l2pg,       &
>>>                        b_non_interlaced)
>>>               use gen, only : rank, node_idx_l2lg, ittot_rt,
>>>      &
>>>                          b_output_matrix, b_enable_output
>>>          use solver_snes_function, only : form_initial_guess,           &
>>>                                           compute_function,             &
>>>                                           compute_jacobian
>>>                   use petsc_mpi_common, only : petsc_mpi_finalize
>>>                   implicit none
>>> #include <finclude/petscsys.h>
>>> #include <finclude/petscvec.h>
>>> #include <finclude/petscvec.h90>
>>> #include <finclude/petscdmda.h>
>>>
>>>          PetscInt :: ilog
>>>          PetscInt :: idetail
>>>          PetscInt :: nngl_in
>>>          PetscReal, allocatable :: a_in(:)
>>>          PetscReal, allocatable :: b_in(:)
>>>          PetscReal, allocatable :: x_inout(:)
>>>          PetscInt, allocatable  :: ia_in(:)
>>>          PetscInt, allocatable  :: ja_in(:)
>>>          PetscInt, allocatable :: row_idx_l2pg(:)
>>>          PetscInt, allocatable :: col_idx_l2pg(:)
>>>          PetscInt  :: itsolv
>>>          PetscBool :: over_flow
>>>          PetscBool :: b_non_interlaced
>>>          PetscReal :: rnorm
>>>                   PetscViewer :: viewer
>>>          PetscScalar,pointer :: vecpointer(:)
>>>
>>>          PetscErrorCode :: ierr
>>>          PetscInt :: info_debug
>>>          character(72) :: strinum
>>>                   info_debug = 0
>>>                   if(b_output_matrix .and. b_enable_output) then
>>>            write(strinum, *) ittot_rt
>>>            strinum = "_"//trim(adjustl(strinum))
>>>          end if
>>>
>>>                           !Form initial guess, only assemble the local
>>> owned part, without ghost nodes
>>>          call form_initial_guess(rank,dmda_react%da,x_inout,x_react_loc,&
>>>                    x_react,nngl_in, row_idx_l2pg,col_idx_l2pg,          &
>>>                    b_non_interlaced)
>>>
>>>                   !Compute function, only assemble the local part,
>>> without ghost nodes
>>>          call compute_function(rank,dmda_react%da,b_in,b_react_loc,     &
>>>                       b_react,nngl_in,row_idx_l2pg,col_idx_l2pg,        &
>>>                       b_non_interlaced)
>>>                           !Compute jacobian matrix, only assemble the
>>> local part, without ghost nodes
>>>          call compute_jacobian(rank,dmda_react%da,                      &
>>>                                a_react,a_in,ia_in,ja_in,nngl_in,        &
>>>                                row_idx_l2pg,col_idx_l2pg,               &
>>>                                b_non_interlaced)
>>>
>>>                   !  Solve a x = b, where a is the Jacobian matrix.
>>>          !    - First, set the KSP linear operators.  Here the matrix
>>> that
>>>          !      defines the linear system also serves as the
>>> preconditioning
>>>          !      matrix.
>>>          !    - Then solve the Newton system.
>>> #ifdef PETSC_V3_5_X
>>>          call KSPSetOperators(ksp_react,a_react,a_react,ierr)
>>> #else
>>>          call KSPSetOperators(ksp_react,a_react,a_react,                &
>>>                               SAME_NONZERO_PATTERN,ierr)
>>> #endif
>>>              call KSPSetDM(ksp_react,dmda_react%da,ierr)
>>>          call KSPSetDMActive(ksp_react,PETSC_FALSE,ierr)
>>>
>>>          call KSPSolve(ksp_react,b_react,x_react,ierr)
>>>                           !Get residual norm, by default, preconditioned
>>> residual norm is calculated.
>>>          if (b_mykspconverged_react .and.                               &
>>>                  .not. b_use_petsc_default_react) then
>>>              rnorm = rnorm_react
>>>          else
>>>              call KSPGetResidualNorm(ksp_react, rnorm, ierr)
>>>          end if
>>>                   !  Scatter ghost points to local vector, using the
>>> 2-step process
>>>          !     DMGlobalToLocalBegin(), DMGlobalToLocalEnd().
>>>          !  By placing code between these two statements, computations
>>> can be
>>>          !  done while messages are in transition.
>>>          call DMGlobalToLocalBegin(dmda_react%da,x_react,INSERT_VALUES, &
>>>                                    x_react_loc,ierr)
>>>          call DMGlobalToLocalEnd(dmda_react%da,x_react,INSERT_VALUES,   &
>>>                                    x_react_loc,ierr)
>>>                  call VecGetArrayF90(x_react_loc,vecpointer,ierr)
>>>                   x_inout = vecpointer
>>>          call VecRestoreArrayF90(x_react_loc,vecpointer,ierr)
>>>                   return
>>>           end subroutine
>>>
>>>     Barry
>>>>
>>>>
>>>>  It is strange why this error does not occur at the first timestep.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Danyang
>>>>>
>>>>>  On Apr 28, 2015, at 7:30 PM, Danyang Su <danyang.su at gmail.com>
>>>>>>>   wrote:
>>>>>>>
>>>>>>> Hi Barry,
>>>>>>>
>>>>>>> There seems another bug (not pretty sure) in PETSc-dev, as shown
>>>>>>> below. The case I used is similar with the one I mentioned recently. I have
>>>>>>> no problem running this case using PETSc 3.5.2. But it give out the
>>>>>>> following error using PETSc-dev, even with only one processor.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Danyang
>>>>>>>
>>>>>>> timestep:    2048 time: 4.615E+00 years   delt: 1.460E-20 years
>>>>>>> iter:  1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>> [0]PETSC ERROR: --------------------- Error Message
>>>>>>> --------------------------------------------------------------
>>>>>>> [0]PETSC ERROR: Object is in wrong state
>>>>>>> [0]PETSC ERROR:  Vec is locked read only, argument # 1
>>>>>>> [0]PETSC ERROR: See
>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>>>>   for trouble shooting.
>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision:
>>>>>>> v3.5.3-2796-g23b6c49  GIT Date: 2015-04-27 11:32:44 -0500
>>>>>>> [0]PETSC ERROR: ../min3p_thcm on a linux-gnu-dbg named nwmop by dsu
>>>>>>> Tue Apr 28 15:56:41 2015
>>>>>>> [0]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>>>> --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>>>> --download-mpich --download-mumps --download-hypre --download-superlu_dist
>>>>>>> --download-metis --download-parmetis --download-scalapack
>>>>>>> [0]PETSC ERROR: #349 VecGetArray() line 1646 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/vec/vec/interface/rvector.c
>>>>>>> [0]PETSC ERROR: #350 VecGetArrayPair() line 434 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/include/petscvec.h
>>>>>>> [0]PETSC ERROR: #351 VecScatterBegin_SSToSS() line 649 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/vec/vec/utils/vscat.c
>>>>>>> [0]PETSC ERROR: #352 VecScatterBegin() line 1694 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/vec/vec/utils/vscat.c
>>>>>>> [0]PETSC ERROR: #353 DMLocalToGlobalBegin_DA() line 56 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/dm/impls/da/dagtol.c
>>>>>>> [0]PETSC ERROR: #354 DMLocalToGlobalBegin() line 1944 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/dm/interface/dm.c
>>>>>>> Reduce time step for reactive transport
>>>>>>> no further time step reduction possible
>>>>>>> Optimal time increment computed by MIN3P   2.6644769214899893E-018
>>>>>>> Minimum time increment specified by user   3.6499999999999998E-018
>>>>>>> Please, reduce the MINIMUM TIME INCREMENT
>>>>>>> stop signal in time step reduction failed
>>>>>>>
>>>>>>> On 15-04-28 11:15 AM, Barry Smith wrote:
>>>>>>>
>>>>>>>      I am forwarding this on to the hypre support email list;
>>>>>>>> hopefully they can have some advice on how to proceed.
>>>>>>>>
>>>>>>>>     hypre folks, this is with version hypre-2.10.0b (the previous
>>>>>>>> version had the same behavior). We are getting a divide by zero in gselim()
>>>>>>>> line 4363 (this happens in a time dependent problem after many successful
>>>>>>>> solves with time dependent matrix).  I looked at the code, below, and note
>>>>>>>> that there are some checks for the diagonal of A not being zero but not for
>>>>>>>> the line that causes the divide by zero; is this perhaps an oversight in
>>>>>>>> the hypre code?  Any advice appreciated.
>>>>>>>>
>>>>>>>>     Thanks
>>>>>>>>
>>>>>>>>     Barry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> HYPRE_Int gselim(A,x,n)
>>>>>>>> HYPRE_Real *A;
>>>>>>>> HYPRE_Real *x;
>>>>>>>> HYPRE_Int n;
>>>>>>>> {
>>>>>>>>     HYPRE_Int    err_flag = 0;
>>>>>>>>     HYPRE_Int    j,k,m;
>>>>>>>>     HYPRE_Real factor;
>>>>>>>>         if (n==1)                           /* A is 1x1 */
>>>>>>>>     {
>>>>>>>>        if (A[0] != 0.0)
>>>>>>>>        {
>>>>>>>>           x[0] = x[0]/A[0];
>>>>>>>>           return(err_flag);
>>>>>>>>        }
>>>>>>>>        else
>>>>>>>>        {
>>>>>>>>           err_flag = 1;
>>>>>>>>           return(err_flag);
>>>>>>>>        }
>>>>>>>>     }
>>>>>>>>     else                               /* A is nxn.  Forward
>>>>>>>> elimination */
>>>>>>>>     {
>>>>>>>>        for (k = 0; k < n-1; k++)
>>>>>>>>        {
>>>>>>>>            if (A[k*n+k] != 0.0)
>>>>>>>>            {
>>>>>>>>               for (j = k+1; j < n; j++)
>>>>>>>>               {
>>>>>>>>                   if (A[j*n+k] != 0.0)
>>>>>>>>                   {
>>>>>>>>                      factor = A[j*n+k]/A[k*n+k];
>>>>>>>>                      for (m = k+1; m < n; m++)
>>>>>>>>                      {
>>>>>>>>                          A[j*n+m]  -= factor * A[k*n+m];
>>>>>>>>                      }
>>>>>>>>                                       /* Elimination step for rhs */
>>>>>>>>                      x[j] -= factor * x[k];
>>>>>>>>                   }
>>>>>>>>               }
>>>>>>>>            }
>>>>>>>>         }
>>>>>>>>                                      /* Back Substitution  */
>>>>>>>>         for (k = n-1; k > 0; --k)
>>>>>>>>         {
>>>>>>>>             x[k] /= A[k*n+k];
>>>>>>>>             for (j = 0; j < k; j++)
>>>>>>>>             {
>>>>>>>>                 if (A[j*n+k] != 0.0)
>>>>>>>>                 {
>>>>>>>>                    x[j] -= x[k] * A[j*n+k];
>>>>>>>>                 }
>>>>>>>>             }
>>>>>>>>         }
>>>>>>>>         x[0] /= A[0];
>>>>>>>>         return(err_flag);
>>>>>>>>      }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Apr 28, 2015, at 12:55 PM, Danyang Su <danyang.su at gmail.com>
>>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>> Hi Barry,
>>>>>>>>>
>>>>>>>>> The development version of PETSc does not help to solve my
>>>>>>>>> problem. It still crashed due to the same error information.
>>>>>>>>>
>>>>>>>>> As Matthew mentioned, I checked the matrix and non of the diagonal
>>>>>>>>> entry is zero. But the diagonal values for different rows range from
>>>>>>>>> -1.0d-18 to 1.0d33 and the matrix is not strict diagonal dominant. When
>>>>>>>>> using 1 processor, the matrix is also not strict diagonal dominant, but the
>>>>>>>>> diagonal values have a much smaller range and the codes can run without
>>>>>>>>> error. Maybe there is something wrong in the matrix for this case and I
>>>>>>>>> need to check the matrix first.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Danyang
>>>>>>>>>
>>>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>>>> 0x00007f668c1299d5 in gselim (A=0x2fe3330, x=0x2210680, n=9)
>>>>>>>>>     at par_relax.c:4363
>>>>>>>>> 4363               x[k] /= A[k*n+k];
>>>>>>>>> (gdb)
>>>>>>>>>
>>>>>>>>> [1]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>> [1]PETSC ERROR: The specific exception can be determined by
>>>>>>>>> running in a debugger.  When the
>>>>>>>>> [1]PETSC ERROR: debugger traps the signal, the exception can be
>>>>>>>>> found with fetestexcept(0x3d)
>>>>>>>>> [1]PETSC ERROR: where the result is a bitwise OR of the following
>>>>>>>>> flags:
>>>>>>>>> [1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
>>>>>>>>> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>> [1]PETSC ERROR: likely location of problem given in stack below
>>>>>>>>> [1]PETSC ERROR: ---------------------  Stack Frames
>>>>>>>>> ------------------------------------
>>>>>>>>> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>>>>>>>> available,
>>>>>>>>> [1]PETSC ERROR:       INSTEAD the line number of the start of the
>>>>>>>>> function
>>>>>>>>> [1]PETSC ERROR:       is given.
>>>>>>>>> [1]PETSC ERROR: [1] PetscDefaultFPTrap line 379
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/sys/error/fp.c
>>>>>>>>> [1]PETSC ERROR: [1] Hypre solve line 216
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/pc/impls/hypre/hypre.c
>>>>>>>>> [1]PETSC ERROR: [1] PCApply_HYPRE line 203
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/pc/impls/hypre/hypre.c
>>>>>>>>> [1]PETSC ERROR: [1] PCApply line 430
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/pc/interface/precon.c
>>>>>>>>> [1]PETSC ERROR: [1] KSP_PCApply line 234
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/include/petsc/private/kspimpl.h
>>>>>>>>> [1]PETSC ERROR: [1] KSPInitialResidual line 44
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/ksp/interface/itres.c
>>>>>>>>> [1]PETSC ERROR: [1] KSPSolve_GMRES line 224
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/ksp/impls/gmres/gmres.c
>>>>>>>>> [1]PETSC ERROR: [1] KSPSolve line 506
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/ksp/interface/itfunc.c
>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in Unknown file
>>>>>>>>> trapped floating point error
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 15-04-27 11:47 AM, Barry Smith wrote:
>>>>>>>>>
>>>>>>>>>      Danyang,
>>>>>>>>>>
>>>>>>>>>>      I'm glad you were finally able to get to the problematic
>>>>>>>>>> point; sorry it took so long (I want to strangle whoever turned on catching
>>>>>>>>>> of underflows in PETSc).
>>>>>>>>>>
>>>>>>>>>>      The hypre folks have done a great deal of work on their
>>>>>>>>>> relaxation code since the PETSc 3.5.3 release. There is a good chance they
>>>>>>>>>> have fixed the divide by zero problem you are hitting here. You will need
>>>>>>>>>> to upgrade to the development version of PETSc (that uses the latest
>>>>>>>>>> version of hypre), here are the instructions on how to obtain it
>>>>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     Please let us know if this resolves the problem with hypre
>>>>>>>>>> failing.
>>>>>>>>>>
>>>>>>>>>>    Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Apr 27, 2015, at 11:44 AM, Danyang Su <danyang.su at gmail.com>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> I got the following arithemetic exception after the previous bug
>>>>>>>>>>> is fixed.
>>>>>>>>>>>
>>>>>>>>>>> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
>>>>>>>>>>> 0x00007f3b23b98f20 in __nanosleep_nocancel ()
>>>>>>>>>>>     at ../sysdeps/unix/syscall-template.S:81
>>>>>>>>>>> 81      ../sysdeps/unix/syscall-template.S: No such file or
>>>>>>>>>>> directory.
>>>>>>>>>>> (gdb) cont
>>>>>>>>>>> Continuing.
>>>>>>>>>>>
>>>>>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>>>>>> 0x00007f3b260df449 in gselim (A=0x1e4c580, x=0x1d30c40, n=9)
>>>>>>>>>>>     at par_relax.c:3442
>>>>>>>>>>> 3442               x[k] /= A[k*n+k];
>>>>>>>>>>> (gdb)
>>>>>>>>>>>
>>>>>>>>>>> I tried both PETSc 3.5.2 and 3.5.3 and they return the same
>>>>>>>>>>> error as shown above. For 3.5.3, i edited fp.c file and then configure and
>>>>>>>>>>> make.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Danyang
>>>>>>>>>>>
>>>>>>>>>>> On 15-04-25 07:34 PM, Danyang Su wrote:
>>>>>>>>>>>
>>>>>>>>>>>  Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> The "floating point underflow" is caused by a small value
>>>>>>>>>>>> divided by a very large value. This result is forced to zero and then it
>>>>>>>>>>>> does not report any underflow problem. I just rerun this bad case to see if
>>>>>>>>>>>> it still get stuck later. This will take a while.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for all your kindly reply,
>>>>>>>>>>>>
>>>>>>>>>>>> Danyang
>>>>>>>>>>>>
>>>>>>>>>>>> On 15-04-25 07:02 PM, Barry Smith wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>     Ok, you do have
>>>>>>>>>>>>>
>>>>>>>>>>>>> #ifndef PETSC_HAVE_XMMINTRIN_H
>>>>>>>>>>>>> #define PETSC_HAVE_XMMINTRIN_H 1
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>
>>>>>>>>>>>>>    so the change you made should cause it to stop trapping
>>>>>>>>>>>>> underflow exceptions.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Now in one email you reported a FPE within hypre, then I
>>>>>>>>>>>>> asked you to run with -start_in_debugger to determine where it happened
>>>>>>>>>>>>> exactly and then you reported the FPE happened in  user code (what seemed
>>>>>>>>>>>>> to be an underflow issue). Why is this? Can you not run it where it
>>>>>>>>>>>>> generated the FPE in hypre using the -start_in_debugger option?
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps you have multiple PETSC_ARCH or multiple PETSc
>>>>>>>>>>>>> installs to explain why you reported two different places where the
>>>>>>>>>>>>> exception occurred.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  On Apr 25, 2015, at 8:31 PM, Danyang Su <danyang.su at gmail.com
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 15-04-25 06:26 PM, Matthew Knepley wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  On Sat, Apr 25, 2015 at 8:23 PM, Danyang Su <
>>>>>>>>>>>>>>> danyang.su at gmail.com>
>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-25 06:03 PM, Barry Smith wrote:
>>>>>>>>>>>>>>>     If this is what you got in your last run
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       at ../../gas_advection/velocity_g.F90:1344
>>>>>>>>>>>>>>> 1344                                        cinfrt =
>>>>>>>>>>>>>>> cinfrt_dg(i1) * diff(ic,idim)                                      !diff is
>>>>>>>>>>>>>>> a very small value, e.g., 1.0d-316
>>>>>>>>>>>>>>>     then it is still catching floating point underflow,
>>>>>>>>>>>>>>> which we do not want. This means either the change I suggested you make in
>>>>>>>>>>>>>>> the fp.c code didn't work or it actually uses a different floating point
>>>>>>>>>>>>>>> trap than that one.  BTW: absurd numbers like 1.0d-316 are often a symptom
>>>>>>>>>>>>>>> of uninitialized data; could that be a problem that diff is not filled
>>>>>>>>>>>>>>> correctly for all the ic, idim you are using?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      This going round and round is very frustrating and a
>>>>>>>>>>>>>>> waste of time. You need to be more proactive yourself and explore the code
>>>>>>>>>>>>>>> and poke around to figure out how to solve the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Please email
>>>>>>>>>>>>>>> $PETSC_DIR/$PETSC_ARCH/include/petscvariables.h so I can see what FP trap
>>>>>>>>>>>>>>> is being used on your machine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>> Do you mean $PETSC_DIR/$PETSC_ARCH/conf/petscvariables?
>>>>>>>>>>>>>>> Otherwise I cannot find this file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Its include/petscconf.h
>>>>>>>>>>>>>>>   Do I need to reconfigure PETSc after changing the code you
>>>>>>>>>>>>>>> mentioned?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, but you need to rebuild.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  Yes, I have done 'gnumake'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      Matt
>>>>>>>>>>>>>>>   Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 25, 2015, at 2:24 PM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-25 11:55 AM, Barry Smith wrote:
>>>>>>>>>>>>>>> On Apr 25, 2015, at 1:51 PM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-25 11:32 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     I told you this yesterday.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     It is probably stopping here on a harmless underflow.
>>>>>>>>>>>>>>> You need to edit the PETSc code to not worry about underflow.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Edit the file
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c and locate
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #elif defined PETSC_HAVE_XMMINTRIN_H
>>>>>>>>>>>>>>>      _MM_SET_EXCEPTION_MASK(_MM_MASK_INEXACT);
>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> change it to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #elif defined PETSC_HAVE_XMMINTRIN_H
>>>>>>>>>>>>>>>      _MM_SET_EXCEPTION_MASK(_MM_MASK_INEXACT |
>>>>>>>>>>>>>>> _MM_MASK_UNDERFLOW);
>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Then run make gnumake in the PETSc directory to compile
>>>>>>>>>>>>>>> the new version. Now link and run the program again with -fp_trap and see
>>>>>>>>>>>>>>> where it gets stuck this time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     Did you do this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    Barry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, I did change the code in fp.c and run 'make gnumake' in
>>>>>>>>>>>>>>> the PETSc directory. I just did a double check and ran make gnumake again
>>>>>>>>>>>>>>> and got the following information this time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> dsu at nwmop:~/Soft/PETSc/petsc-3.5.2$
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    make gnumake
>>>>>>>>>>>>>>> Building PETSc using GNU Make with 10 build threads
>>>>>>>>>>>>>>> ==========================================
>>>>>>>>>>>>>>> make[1]: Entering directory
>>>>>>>>>>>>>>> `/home/dsu/Soft/PETSc/petsc-3.5.2'
>>>>>>>>>>>>>>> make[1]: Nothing to be done for `all'.
>>>>>>>>>>>>>>> make[1]: Leaving directory `/home/dsu/Soft/PETSc/petsc-3.5.2'
>>>>>>>>>>>>>>> =========================================
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then I recompiled the codes, ran with -fp_trap and still got
>>>>>>>>>>>>>>> the following error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Backtrace for this error:
>>>>>>>>>>>>>>> Note: The EXACT line numbers in the stack are not available,
>>>>>>>>>>>>>>> [2]PETSC ERROR:       INSTEAD the line number of the start
>>>>>>>>>>>>>>> of the function
>>>>>>>>>>>>>>> [2]PETSC ERROR:       is given.
>>>>>>>>>>>>>>> [2]PETSC ERROR: [2] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>>        INSTEAD the line number of the start of the function
>>>>>>>>>>>>>>> [3]PETSC ERROR:       is given.
>>>>>>>>>>>>>>> [3]PETSC ERROR: [3] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped floating point error
>>>>>>>>>>>>>>> [3]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped floating point error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      This is different then what you sent a few minutes ago
>>>>>>>>>>>>>>> where it crashed in hypre.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      Anyways you need to use the -start_in_debugger business
>>>>>>>>>>>>>>> I sent in the previous email to see the exact place the problem occurs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the information shown on gdb screen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>>>>>>>>>> 0x00000000006c2bef in velocity_g (l_sufx=1, suffix=...,
>>>>>>>>>>>>>>> nmax=12, njamxc=34,
>>>>>>>>>>>>>>>       cinfradx=..., radial_coordx=.FALSE., _suffix=3)
>>>>>>>>>>>>>>>       at ../../gas_advection/velocity_g.F90:1344
>>>>>>>>>>>>>>> 1344                                        cinfrt =
>>>>>>>>>>>>>>> cinfrt_dg(i1) * diff(ic,idim)                                      !diff is
>>>>>>>>>>>>>>> a very small value, e.g., 1.0d-316
>>>>>>>>>>>>>>> (gdb)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> After type cont on gdb screen, I got error information as
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [1]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a debugger.  When the
>>>>>>>>>>>>>>> [1]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with fetestexcept(0x3d)
>>>>>>>>>>>>>>> [1]PETSC ERROR: where the result is a bitwise OR of the
>>>>>>>>>>>>>>> following flags:
>>>>>>>>>>>>>>> [1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [1]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>> [1]PETSC ERROR: ---------------------  Stack Frames
>>>>>>>>>>>>>>> ------------------------------------
>>>>>>>>>>>>>>> [1]PETSC ERROR: Note: The EXACT line numbers in the stack
>>>>>>>>>>>>>>> are not available,
>>>>>>>>>>>>>>> [1]PETSC ERROR:       INSTEAD the line number of the start
>>>>>>>>>>>>>>> of the function
>>>>>>>>>>>>>>> [1]PETSC ERROR:       is given.
>>>>>>>>>>>>>>> [1]PETSC ERROR: [1] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped floating point error
>>>>>>>>>>>>>>> [0]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [0]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a debugger.  When the
>>>>>>>>>>>>>>> [0]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with fetestexcept(0x3d)
>>>>>>>>>>>>>>> [0]PETSC ERROR: where the result is a bitwise OR of the
>>>>>>>>>>>>>>> following flags:
>>>>>>>>>>>>>>> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [0]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 25, 2015, at 1:05 AM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Barry and Satish,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How can I get rid of unknown floating point error when a
>>>>>>>>>>>>>>> very small value is multiplied.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> e.g.,
>>>>>>>>>>>>>>> cinfrt_dg(i1) and diff(ic,idim) are  1.0250235986806329E-008
>>>>>>>>>>>>>>> 8.6178408169776945E-317 respectively,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cinfrt = cinfrt_dg(i1) * diff(ic,idim)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I get the following error when run with "-fp_trap
>>>>>>>>>>>>>>> -start_in_debugger".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Backtrace for this error:
>>>>>>>>>>>>>>> *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [2]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a debugger.  When the
>>>>>>>>>>>>>>> [2]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with fetestexcept(0x3d)
>>>>>>>>>>>>>>> [2]PETSC ERROR:  cinfrt_dg(i1),diff(ic,idim)
>>>>>>>>>>>>>>> 1.0250235986806329E-008   8.6178408169776945E-317
>>>>>>>>>>>>>>> where the result is a bitwise OR of the following flags:
>>>>>>>>>>>>>>> [2]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [2]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [2]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-24 01:54 PM, Danyang Su wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-24 01:23 PM, Satish Balay wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    c           4   1.0976214263087059E-067
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think this number can be stored in a real*4.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Satish
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks, Satish. It is caused by this number.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, 24 Apr 2015, Danyang Su wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-24 11:12 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 24, 2015, at 1:05 PM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One of my case crashes because of floating point exception
>>>>>>>>>>>>>>> when using 4
>>>>>>>>>>>>>>> processors, as shown below. But if I run this case with 1
>>>>>>>>>>>>>>> processor, it
>>>>>>>>>>>>>>> works fine. I have tested the codes with around 100 cases up
>>>>>>>>>>>>>>> to 768
>>>>>>>>>>>>>>> processors, all other cases work fine. I just wonder if this
>>>>>>>>>>>>>>> kind of error
>>>>>>>>>>>>>>> is caused because of NaN in jacobi matrix, RHS or
>>>>>>>>>>>>>>> preconditioner?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       Yes, almost for sure it is one of these places.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       First run the bad case with -fp_trap  if all goes well
>>>>>>>>>>>>>>> you'll see the
>>>>>>>>>>>>>>> function where the FPE is generated. Then run also with
>>>>>>>>>>>>>>> -start_in_debugger
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> type cont in all four debugger windows. When the FPE happens
>>>>>>>>>>>>>>> the debugger
>>>>>>>>>>>>>>> should stop showing exactly where the FPE happens.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      Barry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If run with -fp_trap -start_in_debugger, I got the following
>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [0]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [0]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a
>>>>>>>>>>>>>>> debugger.  When the
>>>>>>>>>>>>>>> [0]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with
>>>>>>>>>>>>>>> fetestexcept(0x3d)
>>>>>>>>>>>>>>> [0]PETSC ERROR: where the result is a bitwise OR of the
>>>>>>>>>>>>>>> following flags:
>>>>>>>>>>>>>>> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8
>>>>>>>>>>>>>>> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [0]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>> [0]PETSC ERROR: ---------------------  Stack Frames
>>>>>>>>>>>>>>> ------------------------------------
>>>>>>>>>>>>>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack
>>>>>>>>>>>>>>> are not available,
>>>>>>>>>>>>>>> [0]PETSC ERROR:       INSTEAD the line number of the start
>>>>>>>>>>>>>>> of the function
>>>>>>>>>>>>>>> [0]PETSC ERROR:       is given.
>>>>>>>>>>>>>>> [0]PETSC ERROR: [0] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> [0]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped
>>>>>>>>>>>>>>> floating point error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Program received signal SIGABRT: Process abort signal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Backtrace for this error:
>>>>>>>>>>>>>>> #0  0x7F4FEAB1C7D7
>>>>>>>>>>>>>>> #1  0x7F4FEAB1CDDE
>>>>>>>>>>>>>>> #2  0x7F4FE9E1AD3F
>>>>>>>>>>>>>>> #3  0x7F4FE9E1ACC9
>>>>>>>>>>>>>>> #4  0x7F4FE9E1E0D7
>>>>>>>>>>>>>>> #5  0x7F4FEB0B6DCB
>>>>>>>>>>>>>>> #6  0x7F4FEB0B1825
>>>>>>>>>>>>>>> #7  0x7F4FEB0B817F
>>>>>>>>>>>>>>> #8  0x7F4FE9E1AD3F
>>>>>>>>>>>>>>> #9  0x6972C8 in tprfrtlc_ at tprfrtlc.F90:2393
>>>>>>>>>>>>>>> (discriminator 3)
>>>>>>>>>>>>>>> #10  0x4C6C87 in gcreact_ at gcreact.F90:678
>>>>>>>>>>>>>>> #11  0x707E19 in initicrt_ at initicrt.F90:589
>>>>>>>>>>>>>>> #12  0x4F42D0 in initprob_ at initprob.F90:430
>>>>>>>>>>>>>>> #13  0x5AAF72 in driver_pc at driver_pc.F90:438
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I checked the code at  tprfrtlc.F90:2393,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           realbuffer_gb(1:nvars) =
>>>>>>>>>>>>>>> (/time,(c(ic),ic=1,nc-1),     &
>>>>>>>>>>>>>>>                                      (cx(ix),ix=1,nxout)/)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All the values (time, c, cx) are reasonable, as shown below.
>>>>>>>>>>>>>>> The only
>>>>>>>>>>>>>>> possibility is that realbuffer_gb is in declared as real*4
>>>>>>>>>>>>>>> if using sing
>>>>>>>>>>>>>>> precision output while time, c, cx are declared in real*8. I
>>>>>>>>>>>>>>> have a lot of
>>>>>>>>>>>>>>> similar data conversion from real*8 to real*4 output, other
>>>>>>>>>>>>>>> code does not
>>>>>>>>>>>>>>> return error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    time   0.0000000000000000
>>>>>>>>>>>>>>>    c           1   9.9999999999999995E-008
>>>>>>>>>>>>>>>    c           2   3.1555251077549618E-003
>>>>>>>>>>>>>>>    c           3   7.1657814842179362E-008
>>>>>>>>>>>>>>>    c           4   1.0976214263087059E-067
>>>>>>>>>>>>>>>    c           5   5.2879822292305797E-004
>>>>>>>>>>>>>>>    c           6   9.9999999999999964E-005
>>>>>>>>>>>>>>>    c           7   6.4055731968811337E-005
>>>>>>>>>>>>>>>    c           8   3.4607572892578404E-020
>>>>>>>>>>>>>>>    cx           1   3.4376650636008101E-005
>>>>>>>>>>>>>>>    cx           2   7.3989678854017763E-012
>>>>>>>>>>>>>>>    cx           3   9.5317170613607207E-012
>>>>>>>>>>>>>>>    cx           4   2.2344525794718353E-015
>>>>>>>>>>>>>>>    cx           5   3.0624685689695889E-008
>>>>>>>>>>>>>>>    cx           6   1.0046157902783967E-007
>>>>>>>>>>>>>>>    cx           7   1.5320169154914984E-004
>>>>>>>>>>>>>>>    cx           8   8.6930292776346176E-014
>>>>>>>>>>>>>>>    cx           9   3.5944267559348721E-005
>>>>>>>>>>>>>>>    cx          10   3.0072645866951157E-018
>>>>>>>>>>>>>>>    cx          11   2.3592486321095017E-013
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can check all the entries of jacobi matrix to see if the
>>>>>>>>>>>>>>> value is valid,
>>>>>>>>>>>>>>> but this seems not a good idea as it takes a long time to
>>>>>>>>>>>>>>> reach this
>>>>>>>>>>>>>>> point. If I restart the simulation from a specified time
>>>>>>>>>>>>>>> (e.g., 7.685 in
>>>>>>>>>>>>>>> this case), then the error does not occur.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Would you please give me any suggestion on debugging this
>>>>>>>>>>>>>>> case?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> timestep:    2730 time: 7.665E+00 years   delt: 1.000E-02
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep:    2731 time: 7.675E+00 years   delt: 1.000E-02
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep:    2732 time: 7.685E+00 years   delt: 1.000E-02
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep:    2733 time: 7.695E+00 years   delt: 1.000E-02
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep:    2734 time: 7.705E+00 years   delt: 1.000E-02
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> Reduce time step for reactive transport
>>>>>>>>>>>>>>> timestep:    2734 time: 7.700E+00 years   delt: 5.000E-03
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> Reduce time step for reactive transport
>>>>>>>>>>>>>>> timestep:    2734 time: 7.697E+00 years   delt: 2.500E-03
>>>>>>>>>>>>>>> years iter:  1
>>>>>>>>>>>>>>> timestep:    max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>>> [1]PETSC ERROR: Floating point exception
>>>>>>>>>>>>>>> [2]PETSC ERROR: --------------------- Error Message
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>>> [2]PETSC ERROR: Floating point exception
>>>>>>>>>>>>>>> [2]PETSC ERROR: Vec entry at local location 0 is
>>>>>>>>>>>>>>> not-a-number or infinite
>>>>>>>>>>>>>>> at end of function: Parameter number 3
>>>>>>>>>>>>>>> [2]PETSC ERROR: See
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for trouble shooting.
>>>>>>>>>>>>>>> [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>>>>>>>>>>>> [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location
>>>>>>>>>>>>>>> 0 is
>>>>>>>>>>>>>>> not-a-number or infinite at end of function: Parameter
>>>>>>>>>>>>>>> number 3
>>>>>>>>>>>>>>> [1]PETSC ERROR: See
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for trouble shooting.
>>>>>>>>>>>>>>> [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>>>>>>>>>>>> [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg
>>>>>>>>>>>>>>> named nwmop by
>>>>>>>>>>>>>>> dsu Thu Apr 23 15:38:52 2015
>>>>>>>>>>>>>>> [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>>>>>>>>>>>> --with-cc=gcc
>>>>>>>>>>>>>>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>>>>>>>>>>>> --download-mpich
>>>>>>>>>>>>>>> --download-mumps --download-hypre --download-superlu_dist
>>>>>>>>>>>>>>> --download-metis
>>>>>>>>>>>>>>> --download-parmetis --download-scalapack
>>>>>>>>>>>>>>> [1]PETSC ERROR: #1 VecValidValues() line 34 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>>>>>>>>>>>>>>> ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by
>>>>>>>>>>>>>>> dsu Thu Apr 23
>>>>>>>>>>>>>>> 15:38:52 2015
>>>>>>>>>>>>>>> [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>>>>>>>>>>>> --with-cc=gcc
>>>>>>>>>>>>>>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>>>>>>>>>>>> --download-mpich
>>>>>>>>>>>>>>> --download-mumps --download-hypre --download-superlu_dist
>>>>>>>>>>>>>>> --download-metis
>>>>>>>>>>>>>>> --download-parmetis --download-scalapack
>>>>>>>>>>>>>>> [2]PETSC ERROR: #1 VecValidValues() line 34 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: #2 PCApply() line 442 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #3 KSP_PCApply() line 230 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>>>>>>>>>>>>>>> #3 KSP_PCApply() line 230 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>>>>>>>>>>>>>>> [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #6 KSPSolve() line 459 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: #6 KSPSolve() line 459 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>>>>>>>>>>>>>>> ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
>>>>>>>>>>>>>>> [mpiexec at nwmop] Press Ctrl-C again to force abort
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  <petscconf.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150430/9749740d/attachment-0001.html>