[petsc-users] Floating point exception in hypre BoomerAMG
Matthew Knepley
knepley at gmail.com
Wed Apr 29 14:39:19 CDT 2015
On Thu, Apr 30, 2015 at 5:35 AM, Danyang Su <danyang.su at gmail.com> wrote:
> On 15-04-29 12:19 PM, Barry Smith wrote:
>
>> Ok, your code seems fine in terms of logic. But the vector x_vec_gbl
>> i.e. b_react should not be in a read only state. Are you somewhere calling
>> a VecGetArray() and not calling the VecRestoreArray()?
>>
>> Barry
>>
> I just made a triple check on VecGetArrayF90(). All the VecGetArrayF90()
> come with VecRestoreArrayF90(). No VecGetArray() is used in my codes.
>
So you can check the lock status of the Vec:
http://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/Vec/VecLockGet.html
Put these in your code until you can see who is locking the Vec before the
call.
Thanks,
Matt
> Danyang
>
>>
>>
>>
>> On Apr 29, 2015, at 1:52 PM, Danyang Su <danyang.su at gmail.com> wrote:
>>>
>>> On 15-04-29 11:30 AM, Barry Smith wrote:
>>>
>>>> On Apr 29, 2015, at 12:15 PM, Danyang Su <danyang.su at gmail.com>
>>>>> wrote:
>>>>>
>>>>> On 15-04-28 06:50 PM, Barry Smith wrote:
>>>>>
>>>>> We started enforcing more checks on writing to vectors that you
>>>>>> should not write to. Where are you calling DMLocalToGlobalBegin() ? It
>>>>>> looks like you are trying to copy values into a global array that you
>>>>>> should not because it is a read only input to a function, for example it is
>>>>>> the right hand side of a linear system or the input into a
>>>>>> SNESFormFunction(). So check the output to that call.
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>
>>>>>> This is used when local RHS vec is calculated and copied to global
>>>>> RHS.
>>>>>
>>>>> Huhh, why are you copying anything into the RHS? Is this before
>>>> you call the linear system solve? Send the code that calls the
>>>> DMLocalToGlobalBegin()
>>>>
>>> Yes. This is called before the linear system solver. At present, we use
>>> PETSc KSP solver.
>>> The code is first developed in totally sequential version and then
>>> ported to the parallel version. So the subdomain has its own space for
>>> jacobi matrix and rhs. We do not use global RHS vector directly to set the
>>> RHS at this moment. All the RHS values of subdomain are copied to the
>>> global RHS.
>>>
>>> !> This is the function where RHS values of subdomain are copied
>>> to the global RHS values.
>>> subroutine compute_function(rank,da,x_array_loc,x_vec_loc, &
>>> x_vec_gbl,nngl,row_idx_l2pg,col_idx_l2pg, &
>>> b_non_interlaced)
>>> implicit none
>>> #include <finclude/petscsys.h>
>>> #include <finclude/petscis.h>
>>> #include <finclude/petscvec.h>
>>> #include <finclude/petscvec.h90>
>>> #include <finclude/petscmat.h>
>>> #include <finclude/petscpc.h>
>>> #include <finclude/petscksp.h>
>>> #include <finclude/petscdmda.h>
>>> #include <finclude/petscdmda.h90>
>>>
>>> PetscInt :: rank
>>> DM :: da
>>> PetscReal, allocatable :: x_array_loc(:)
>>> Vec :: x_vec_loc
>>> Vec :: x_vec_gbl
>>> PetscInt :: nngl
>>> PetscInt, allocatable :: row_idx_l2pg(:)
>>> PetscInt, allocatable :: col_idx_l2pg(:)
>>> PetscBool :: b_non_interlaced
>>>
>>> PetscInt :: info_debug
>>> PetscInt :: i, j
>>> PetscErrorCode :: ierr
>>> PetscScalar, pointer :: vecpointer(:)
>>> !Zero entries
>>> call VecZeroEntries(x_vec_loc, ierr)
>>> !Get a pointer to vector data when you need access
>>> to the array
>>> call VecGetArrayF90(x_vec_loc, vecpointer, ierr)
>>> !Compute the function over the locally owned part
>>> of the grid
>>> if(b_non_interlaced) then
>>> j = nngl/2
>>> do i = 1, j
>>> vecpointer(2*i-1) = x_array_loc(i)
>>> vecpointer(2*i) = x_array_loc(i+j)
>>> end do
>>> else
>>> do i = 1, nngl
>>> vecpointer(i) = x_array_loc(i)
>>> end do
>>> end if
>>> !Restore the vector when you no longer need access
>>> to the array
>>> call VecRestoreArrayF90(x_vec_loc,vecpointer,ierr)
>>> !Insert values into global vector
>>> call DMLocalToGlobalBegin(da,x_vec_loc,INSERT_VALUES, &
>>> x_vec_gbl,ierr)
>>> !By placing code between these two statements, computations
>>> can be
>>> !done while messages are in transition.
>>> call DMLocalToGlobalEnd(da,x_vec_loc,INSERT_VALUES, &
>>> x_vec_gbl,ierr)
>>> return
>>> end subroutine
>>>
>>>
>>>
>>> !> This is the function where reactive transport equations are
>>> solved.
>>> subroutine solver_dd_snes_solve_react(ilog,idetail,a_in,b_in, &
>>> x_inout,ia_in,ja_in,nngl_in,itsolv, &
>>> over_flow,rnorm,row_idx_l2pg,col_idx_l2pg, &
>>> b_non_interlaced)
>>> use gen, only : rank, node_idx_l2lg, ittot_rt,
>>> &
>>> b_output_matrix, b_enable_output
>>> use solver_snes_function, only : form_initial_guess, &
>>> compute_function, &
>>> compute_jacobian
>>> use petsc_mpi_common, only : petsc_mpi_finalize
>>> implicit none
>>> #include <finclude/petscsys.h>
>>> #include <finclude/petscvec.h>
>>> #include <finclude/petscvec.h90>
>>> #include <finclude/petscdmda.h>
>>>
>>> PetscInt :: ilog
>>> PetscInt :: idetail
>>> PetscInt :: nngl_in
>>> PetscReal, allocatable :: a_in(:)
>>> PetscReal, allocatable :: b_in(:)
>>> PetscReal, allocatable :: x_inout(:)
>>> PetscInt, allocatable :: ia_in(:)
>>> PetscInt, allocatable :: ja_in(:)
>>> PetscInt, allocatable :: row_idx_l2pg(:)
>>> PetscInt, allocatable :: col_idx_l2pg(:)
>>> PetscInt :: itsolv
>>> PetscBool :: over_flow
>>> PetscBool :: b_non_interlaced
>>> PetscReal :: rnorm
>>> PetscViewer :: viewer
>>> PetscScalar,pointer :: vecpointer(:)
>>>
>>> PetscErrorCode :: ierr
>>> PetscInt :: info_debug
>>> character(72) :: strinum
>>> info_debug = 0
>>> if(b_output_matrix .and. b_enable_output) then
>>> write(strinum, *) ittot_rt
>>> strinum = "_"//trim(adjustl(strinum))
>>> end if
>>>
>>> !Form initial guess, only assemble the local
>>> owned part, without ghost nodes
>>> call form_initial_guess(rank,dmda_react%da,x_inout,x_react_loc,&
>>> x_react,nngl_in, row_idx_l2pg,col_idx_l2pg, &
>>> b_non_interlaced)
>>>
>>> !Compute function, only assemble the local part,
>>> without ghost nodes
>>> call compute_function(rank,dmda_react%da,b_in,b_react_loc, &
>>> b_react,nngl_in,row_idx_l2pg,col_idx_l2pg, &
>>> b_non_interlaced)
>>> !Compute jacobian matrix, only assemble the
>>> local part, without ghost nodes
>>> call compute_jacobian(rank,dmda_react%da, &
>>> a_react,a_in,ia_in,ja_in,nngl_in, &
>>> row_idx_l2pg,col_idx_l2pg, &
>>> b_non_interlaced)
>>>
>>> ! Solve a x = b, where a is the Jacobian matrix.
>>> ! - First, set the KSP linear operators. Here the matrix
>>> that
>>> ! defines the linear system also serves as the
>>> preconditioning
>>> ! matrix.
>>> ! - Then solve the Newton system.
>>> #ifdef PETSC_V3_5_X
>>> call KSPSetOperators(ksp_react,a_react,a_react,ierr)
>>> #else
>>> call KSPSetOperators(ksp_react,a_react,a_react, &
>>> SAME_NONZERO_PATTERN,ierr)
>>> #endif
>>> call KSPSetDM(ksp_react,dmda_react%da,ierr)
>>> call KSPSetDMActive(ksp_react,PETSC_FALSE,ierr)
>>>
>>> call KSPSolve(ksp_react,b_react,x_react,ierr)
>>> !Get residual norm, by default, preconditioned
>>> residual norm is calculated.
>>> if (b_mykspconverged_react .and. &
>>> .not. b_use_petsc_default_react) then
>>> rnorm = rnorm_react
>>> else
>>> call KSPGetResidualNorm(ksp_react, rnorm, ierr)
>>> end if
>>> ! Scatter ghost points to local vector, using the
>>> 2-step process
>>> ! DMGlobalToLocalBegin(), DMGlobalToLocalEnd().
>>> ! By placing code between these two statements, computations
>>> can be
>>> ! done while messages are in transition.
>>> call DMGlobalToLocalBegin(dmda_react%da,x_react,INSERT_VALUES, &
>>> x_react_loc,ierr)
>>> call DMGlobalToLocalEnd(dmda_react%da,x_react,INSERT_VALUES, &
>>> x_react_loc,ierr)
>>> call VecGetArrayF90(x_react_loc,vecpointer,ierr)
>>> x_inout = vecpointer
>>> call VecRestoreArrayF90(x_react_loc,vecpointer,ierr)
>>> return
>>> end subroutine
>>>
>>> Barry
>>>>
>>>>
>>>> It is strange why this error does not occur at the first timestep.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Danyang
>>>>>
>>>>> On Apr 28, 2015, at 7:30 PM, Danyang Su <danyang.su at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Barry,
>>>>>>>
>>>>>>> There seems another bug (not pretty sure) in PETSc-dev, as shown
>>>>>>> below. The case I used is similar with the one I mentioned recently. I have
>>>>>>> no problem running this case using PETSc 3.5.2. But it give out the
>>>>>>> following error using PETSc-dev, even with only one processor.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Danyang
>>>>>>>
>>>>>>> timestep: 2048 time: 4.615E+00 years delt: 1.460E-20 years
>>>>>>> iter: 1 max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>> [0]PETSC ERROR: --------------------- Error Message
>>>>>>> --------------------------------------------------------------
>>>>>>> [0]PETSC ERROR: Object is in wrong state
>>>>>>> [0]PETSC ERROR: Vec is locked read only, argument # 1
>>>>>>> [0]PETSC ERROR: See
>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>>>> for trouble shooting.
>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision:
>>>>>>> v3.5.3-2796-g23b6c49 GIT Date: 2015-04-27 11:32:44 -0500
>>>>>>> [0]PETSC ERROR: ../min3p_thcm on a linux-gnu-dbg named nwmop by dsu
>>>>>>> Tue Apr 28 15:56:41 2015
>>>>>>> [0]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>>>> --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>>>> --download-mpich --download-mumps --download-hypre --download-superlu_dist
>>>>>>> --download-metis --download-parmetis --download-scalapack
>>>>>>> [0]PETSC ERROR: #349 VecGetArray() line 1646 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/vec/vec/interface/rvector.c
>>>>>>> [0]PETSC ERROR: #350 VecGetArrayPair() line 434 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/include/petscvec.h
>>>>>>> [0]PETSC ERROR: #351 VecScatterBegin_SSToSS() line 649 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/vec/vec/utils/vscat.c
>>>>>>> [0]PETSC ERROR: #352 VecScatterBegin() line 1694 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/vec/vec/utils/vscat.c
>>>>>>> [0]PETSC ERROR: #353 DMLocalToGlobalBegin_DA() line 56 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/dm/impls/da/dagtol.c
>>>>>>> [0]PETSC ERROR: #354 DMLocalToGlobalBegin() line 1944 in
>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/dm/interface/dm.c
>>>>>>> Reduce time step for reactive transport
>>>>>>> no further time step reduction possible
>>>>>>> Optimal time increment computed by MIN3P 2.6644769214899893E-018
>>>>>>> Minimum time increment specified by user 3.6499999999999998E-018
>>>>>>> Please, reduce the MINIMUM TIME INCREMENT
>>>>>>> stop signal in time step reduction failed
>>>>>>>
>>>>>>> On 15-04-28 11:15 AM, Barry Smith wrote:
>>>>>>>
>>>>>>> I am forwarding this on to the hypre support email list;
>>>>>>>> hopefully they can have some advice on how to proceed.
>>>>>>>>
>>>>>>>> hypre folks, this is with version hypre-2.10.0b (the previous
>>>>>>>> version had the same behavior). We are getting a divide by zero in gselim()
>>>>>>>> line 4363 (this happens in a time dependent problem after many successful
>>>>>>>> solves with time dependent matrix). I looked at the code, below, and note
>>>>>>>> that there are some checks for the diagonal of A not being zero but not for
>>>>>>>> the line that causes the divide by zero; is this perhaps an oversight in
>>>>>>>> the hypre code? Any advice appreciated.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Barry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> HYPRE_Int gselim(A,x,n)
>>>>>>>> HYPRE_Real *A;
>>>>>>>> HYPRE_Real *x;
>>>>>>>> HYPRE_Int n;
>>>>>>>> {
>>>>>>>> HYPRE_Int err_flag = 0;
>>>>>>>> HYPRE_Int j,k,m;
>>>>>>>> HYPRE_Real factor;
>>>>>>>> if (n==1) /* A is 1x1 */
>>>>>>>> {
>>>>>>>> if (A[0] != 0.0)
>>>>>>>> {
>>>>>>>> x[0] = x[0]/A[0];
>>>>>>>> return(err_flag);
>>>>>>>> }
>>>>>>>> else
>>>>>>>> {
>>>>>>>> err_flag = 1;
>>>>>>>> return(err_flag);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> else /* A is nxn. Forward
>>>>>>>> elimination */
>>>>>>>> {
>>>>>>>> for (k = 0; k < n-1; k++)
>>>>>>>> {
>>>>>>>> if (A[k*n+k] != 0.0)
>>>>>>>> {
>>>>>>>> for (j = k+1; j < n; j++)
>>>>>>>> {
>>>>>>>> if (A[j*n+k] != 0.0)
>>>>>>>> {
>>>>>>>> factor = A[j*n+k]/A[k*n+k];
>>>>>>>> for (m = k+1; m < n; m++)
>>>>>>>> {
>>>>>>>> A[j*n+m] -= factor * A[k*n+m];
>>>>>>>> }
>>>>>>>> /* Elimination step for rhs */
>>>>>>>> x[j] -= factor * x[k];
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>> /* Back Substitution */
>>>>>>>> for (k = n-1; k > 0; --k)
>>>>>>>> {
>>>>>>>> x[k] /= A[k*n+k];
>>>>>>>> for (j = 0; j < k; j++)
>>>>>>>> {
>>>>>>>> if (A[j*n+k] != 0.0)
>>>>>>>> {
>>>>>>>> x[j] -= x[k] * A[j*n+k];
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>> x[0] /= A[0];
>>>>>>>> return(err_flag);
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 28, 2015, at 12:55 PM, Danyang Su <danyang.su at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Barry,
>>>>>>>>>
>>>>>>>>> The development version of PETSc does not help to solve my
>>>>>>>>> problem. It still crashed due to the same error information.
>>>>>>>>>
>>>>>>>>> As Matthew mentioned, I checked the matrix and non of the diagonal
>>>>>>>>> entry is zero. But the diagonal values for different rows range from
>>>>>>>>> -1.0d-18 to 1.0d33 and the matrix is not strict diagonal dominant. When
>>>>>>>>> using 1 processor, the matrix is also not strict diagonal dominant, but the
>>>>>>>>> diagonal values have a much smaller range and the codes can run without
>>>>>>>>> error. Maybe there is something wrong in the matrix for this case and I
>>>>>>>>> need to check the matrix first.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Danyang
>>>>>>>>>
>>>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>>>> 0x00007f668c1299d5 in gselim (A=0x2fe3330, x=0x2210680, n=9)
>>>>>>>>> at par_relax.c:4363
>>>>>>>>> 4363 x[k] /= A[k*n+k];
>>>>>>>>> (gdb)
>>>>>>>>>
>>>>>>>>> [1]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>> [1]PETSC ERROR: The specific exception can be determined by
>>>>>>>>> running in a debugger. When the
>>>>>>>>> [1]PETSC ERROR: debugger traps the signal, the exception can be
>>>>>>>>> found with fetestexcept(0x3d)
>>>>>>>>> [1]PETSC ERROR: where the result is a bitwise OR of the following
>>>>>>>>> flags:
>>>>>>>>> [1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8
>>>>>>>>> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>> [1]PETSC ERROR: likely location of problem given in stack below
>>>>>>>>> [1]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>> ------------------------------------
>>>>>>>>> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>>>>>>>> available,
>>>>>>>>> [1]PETSC ERROR: INSTEAD the line number of the start of the
>>>>>>>>> function
>>>>>>>>> [1]PETSC ERROR: is given.
>>>>>>>>> [1]PETSC ERROR: [1] PetscDefaultFPTrap line 379
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/sys/error/fp.c
>>>>>>>>> [1]PETSC ERROR: [1] Hypre solve line 216
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/pc/impls/hypre/hypre.c
>>>>>>>>> [1]PETSC ERROR: [1] PCApply_HYPRE line 203
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/pc/impls/hypre/hypre.c
>>>>>>>>> [1]PETSC ERROR: [1] PCApply line 430
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/pc/interface/precon.c
>>>>>>>>> [1]PETSC ERROR: [1] KSP_PCApply line 234
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/include/petsc/private/kspimpl.h
>>>>>>>>> [1]PETSC ERROR: [1] KSPInitialResidual line 44
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/ksp/interface/itres.c
>>>>>>>>> [1]PETSC ERROR: [1] KSPSolve_GMRES line 224
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/ksp/impls/gmres/gmres.c
>>>>>>>>> [1]PETSC ERROR: [1] KSPSolve line 506
>>>>>>>>> /home/dsu/Soft/PETSc/petsc-dev/src/ksp/ksp/interface/itfunc.c
>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in Unknown file
>>>>>>>>> trapped floating point error
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 15-04-27 11:47 AM, Barry Smith wrote:
>>>>>>>>>
>>>>>>>>> Danyang,
>>>>>>>>>>
>>>>>>>>>> I'm glad you were finally able to get to the problematic
>>>>>>>>>> point; sorry it took so long (I want to strangle whoever turned on catching
>>>>>>>>>> of underflows in PETSc).
>>>>>>>>>>
>>>>>>>>>> The hypre folks have done a great deal of work on their
>>>>>>>>>> relaxation code since the PETSc 3.5.3 release. There is a good chance they
>>>>>>>>>> have fixed the divide by zero problem you are hitting here. You will need
>>>>>>>>>> to upgrade to the development version of PETSc (that uses the latest
>>>>>>>>>> version of hypre), here are the instructions on how to obtain it
>>>>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please let us know if this resolves the problem with hypre
>>>>>>>>>> failing.
>>>>>>>>>>
>>>>>>>>>> Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Apr 27, 2015, at 11:44 AM, Danyang Su <danyang.su at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> I got the following arithemetic exception after the previous bug
>>>>>>>>>>> is fixed.
>>>>>>>>>>>
>>>>>>>>>>> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
>>>>>>>>>>> 0x00007f3b23b98f20 in __nanosleep_nocancel ()
>>>>>>>>>>> at ../sysdeps/unix/syscall-template.S:81
>>>>>>>>>>> 81 ../sysdeps/unix/syscall-template.S: No such file or
>>>>>>>>>>> directory.
>>>>>>>>>>> (gdb) cont
>>>>>>>>>>> Continuing.
>>>>>>>>>>>
>>>>>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>>>>>> 0x00007f3b260df449 in gselim (A=0x1e4c580, x=0x1d30c40, n=9)
>>>>>>>>>>> at par_relax.c:3442
>>>>>>>>>>> 3442 x[k] /= A[k*n+k];
>>>>>>>>>>> (gdb)
>>>>>>>>>>>
>>>>>>>>>>> I tried both PETSc 3.5.2 and 3.5.3 and they return the same
>>>>>>>>>>> error as shown above. For 3.5.3, i edited fp.c file and then configure and
>>>>>>>>>>> make.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Danyang
>>>>>>>>>>>
>>>>>>>>>>> On 15-04-25 07:34 PM, Danyang Su wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> The "floating point underflow" is caused by a small value
>>>>>>>>>>>> divided by a very large value. This result is forced to zero and then it
>>>>>>>>>>>> does not report any underflow problem. I just rerun this bad case to see if
>>>>>>>>>>>> it still get stuck later. This will take a while.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for all your kindly reply,
>>>>>>>>>>>>
>>>>>>>>>>>> Danyang
>>>>>>>>>>>>
>>>>>>>>>>>> On 15-04-25 07:02 PM, Barry Smith wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, you do have
>>>>>>>>>>>>>
>>>>>>>>>>>>> #ifndef PETSC_HAVE_XMMINTRIN_H
>>>>>>>>>>>>> #define PETSC_HAVE_XMMINTRIN_H 1
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>
>>>>>>>>>>>>> so the change you made should cause it to stop trapping
>>>>>>>>>>>>> underflow exceptions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now in one email you reported a FPE within hypre, then I
>>>>>>>>>>>>> asked you to run with -start_in_debugger to determine where it happened
>>>>>>>>>>>>> exactly and then you reported the FPE happened in user code (what seemed
>>>>>>>>>>>>> to be an underflow issue). Why is this? Can you not run it where it
>>>>>>>>>>>>> generated the FPE in hypre using the -start_in_debugger option?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps you have multiple PETSC_ARCH or multiple PETSc
>>>>>>>>>>>>> installs to explain why you reported two different places where the
>>>>>>>>>>>>> exception occurred.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 25, 2015, at 8:31 PM, Danyang Su <danyang.su at gmail.com
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 15-04-25 06:26 PM, Matthew Knepley wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Apr 25, 2015 at 8:23 PM, Danyang Su <
>>>>>>>>>>>>>>> danyang.su at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-25 06:03 PM, Barry Smith wrote:
>>>>>>>>>>>>>>> If this is what you got in your last run
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> at ../../gas_advection/velocity_g.F90:1344
>>>>>>>>>>>>>>> 1344 cinfrt =
>>>>>>>>>>>>>>> cinfrt_dg(i1) * diff(ic,idim) !diff is
>>>>>>>>>>>>>>> a very small value, e.g., 1.0d-316
>>>>>>>>>>>>>>> then it is still catching floating point underflow,
>>>>>>>>>>>>>>> which we do not want. This means either the change I suggested you make in
>>>>>>>>>>>>>>> the fp.c code didn't work or it actually uses a different floating point
>>>>>>>>>>>>>>> trap than that one. BTW: absurd numbers like 1.0d-316 are often a symptom
>>>>>>>>>>>>>>> of uninitialized data; could that be a problem that diff is not filled
>>>>>>>>>>>>>>> correctly for all the ic, idim you are using?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This going round and round is very frustrating and a
>>>>>>>>>>>>>>> waste of time. You need to be more proactive yourself and explore the code
>>>>>>>>>>>>>>> and poke around to figure out how to solve the problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please email
>>>>>>>>>>>>>>> $PETSC_DIR/$PETSC_ARCH/include/petscvariables.h so I can see what FP trap
>>>>>>>>>>>>>>> is being used on your machine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>> Do you mean $PETSC_DIR/$PETSC_ARCH/conf/petscvariables?
>>>>>>>>>>>>>>> Otherwise I cannot find this file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Its include/petscconf.h
>>>>>>>>>>>>>>> Do I need to reconfigure PETSc after changing the code you
>>>>>>>>>>>>>>> mentioned?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, but you need to rebuild.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, I have done 'gnumake'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 25, 2015, at 2:24 PM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-25 11:55 AM, Barry Smith wrote:
>>>>>>>>>>>>>>> On Apr 25, 2015, at 1:51 PM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-25 11:32 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I told you this yesterday.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is probably stopping here on a harmless underflow.
>>>>>>>>>>>>>>> You need to edit the PETSc code to not worry about underflow.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Edit the file
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c and locate
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #elif defined PETSC_HAVE_XMMINTRIN_H
>>>>>>>>>>>>>>> _MM_SET_EXCEPTION_MASK(_MM_MASK_INEXACT);
>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> change it to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #elif defined PETSC_HAVE_XMMINTRIN_H
>>>>>>>>>>>>>>> _MM_SET_EXCEPTION_MASK(_MM_MASK_INEXACT |
>>>>>>>>>>>>>>> _MM_MASK_UNDERFLOW);
>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then run make gnumake in the PETSc directory to compile
>>>>>>>>>>>>>>> the new version. Now link and run the program again with -fp_trap and see
>>>>>>>>>>>>>>> where it gets stuck this time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did you do this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, I did change the code in fp.c and run 'make gnumake' in
>>>>>>>>>>>>>>> the PETSc directory. I just did a double check and ran make gnumake again
>>>>>>>>>>>>>>> and got the following information this time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> dsu at nwmop:~/Soft/PETSc/petsc-3.5.2$
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> make gnumake
>>>>>>>>>>>>>>> Building PETSc using GNU Make with 10 build threads
>>>>>>>>>>>>>>> ==========================================
>>>>>>>>>>>>>>> make[1]: Entering directory
>>>>>>>>>>>>>>> `/home/dsu/Soft/PETSc/petsc-3.5.2'
>>>>>>>>>>>>>>> make[1]: Nothing to be done for `all'.
>>>>>>>>>>>>>>> make[1]: Leaving directory `/home/dsu/Soft/PETSc/petsc-3.5.2'
>>>>>>>>>>>>>>> =========================================
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then I recompiled the codes, ran with -fp_trap and still got
>>>>>>>>>>>>>>> the following error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Backtrace for this error:
>>>>>>>>>>>>>>> Note: The EXACT line numbers in the stack are not available,
>>>>>>>>>>>>>>> [2]PETSC ERROR: INSTEAD the line number of the start
>>>>>>>>>>>>>>> of the function
>>>>>>>>>>>>>>> [2]PETSC ERROR: is given.
>>>>>>>>>>>>>>> [2]PETSC ERROR: [2] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> INSTEAD the line number of the start of the function
>>>>>>>>>>>>>>> [3]PETSC ERROR: is given.
>>>>>>>>>>>>>>> [3]PETSC ERROR: [3] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped floating point error
>>>>>>>>>>>>>>> [3]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped floating point error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is different then what you sent a few minutes ago
>>>>>>>>>>>>>>> where it crashed in hypre.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anyways you need to use the -start_in_debugger business
>>>>>>>>>>>>>>> I sent in the previous email to see the exact place the problem occurs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the information shown on gdb screen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>>>>>>>>>> 0x00000000006c2bef in velocity_g (l_sufx=1, suffix=...,
>>>>>>>>>>>>>>> nmax=12, njamxc=34,
>>>>>>>>>>>>>>> cinfradx=..., radial_coordx=.FALSE., _suffix=3)
>>>>>>>>>>>>>>> at ../../gas_advection/velocity_g.F90:1344
>>>>>>>>>>>>>>> 1344 cinfrt =
>>>>>>>>>>>>>>> cinfrt_dg(i1) * diff(ic,idim) !diff is
>>>>>>>>>>>>>>> a very small value, e.g., 1.0d-316
>>>>>>>>>>>>>>> (gdb)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> After type cont on gdb screen, I got error information as
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [1]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a debugger. When the
>>>>>>>>>>>>>>> [1]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with fetestexcept(0x3d)
>>>>>>>>>>>>>>> [1]PETSC ERROR: where the result is a bitwise OR of the
>>>>>>>>>>>>>>> following flags:
>>>>>>>>>>>>>>> [1]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [1]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [1]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>>>>>>>> ------------------------------------
>>>>>>>>>>>>>>> [1]PETSC ERROR: Note: The EXACT line numbers in the stack
>>>>>>>>>>>>>>> are not available,
>>>>>>>>>>>>>>> [1]PETSC ERROR: INSTEAD the line number of the start
>>>>>>>>>>>>>>> of the function
>>>>>>>>>>>>>>> [1]PETSC ERROR: is given.
>>>>>>>>>>>>>>> [1]PETSC ERROR: [1] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped floating point error
>>>>>>>>>>>>>>> [0]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [0]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a debugger. When the
>>>>>>>>>>>>>>> [0]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with fetestexcept(0x3d)
>>>>>>>>>>>>>>> [0]PETSC ERROR: where the result is a bitwise OR of the
>>>>>>>>>>>>>>> following flags:
>>>>>>>>>>>>>>> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [0]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 25, 2015, at 1:05 AM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Barry and Satish,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How can I get rid of unknown floating point error when a
>>>>>>>>>>>>>>> very small value is multiplied.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> e.g.,
>>>>>>>>>>>>>>> cinfrt_dg(i1) and diff(ic,idim) are 1.0250235986806329E-008
>>>>>>>>>>>>>>> 8.6178408169776945E-317 respectively,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cinfrt = cinfrt_dg(i1) * diff(ic,idim)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I get the following error when run with "-fp_trap
>>>>>>>>>>>>>>> -start_in_debugger".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Backtrace for this error:
>>>>>>>>>>>>>>> *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [2]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a debugger. When the
>>>>>>>>>>>>>>> [2]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with fetestexcept(0x3d)
>>>>>>>>>>>>>>> [2]PETSC ERROR: cinfrt_dg(i1),diff(ic,idim)
>>>>>>>>>>>>>>> 1.0250235986806329E-008 8.6178408169776945E-317
>>>>>>>>>>>>>>> where the result is a bitwise OR of the following flags:
>>>>>>>>>>>>>>> [2]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [2]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [2]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-24 01:54 PM, Danyang Su wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-24 01:23 PM, Satish Balay wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> c 4 1.0976214263087059E-067
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think this number can be stored in a real*4.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Satish
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks, Satish. It is caused by this number.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, 24 Apr 2015, Danyang Su wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15-04-24 11:12 AM, Barry Smith wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 24, 2015, at 1:05 PM, Danyang Su
>>>>>>>>>>>>>>> <danyang.su at gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One of my case crashes because of floating point exception
>>>>>>>>>>>>>>> when using 4
>>>>>>>>>>>>>>> processors, as shown below. But if I run this case with 1
>>>>>>>>>>>>>>> processor, it
>>>>>>>>>>>>>>> works fine. I have tested the codes with around 100 cases up
>>>>>>>>>>>>>>> to 768
>>>>>>>>>>>>>>> processors, all other cases work fine. I just wonder if this
>>>>>>>>>>>>>>> kind of error
>>>>>>>>>>>>>>> is caused because of NaN in jacobi matrix, RHS or
>>>>>>>>>>>>>>> preconditioner?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, almost for sure it is one of these places.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> First run the bad case with -fp_trap if all goes well
>>>>>>>>>>>>>>> you'll see the
>>>>>>>>>>>>>>> function where the FPE is generated. Then run also with
>>>>>>>>>>>>>>> -start_in_debugger
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> type cont in all four debugger windows. When the FPE happens
>>>>>>>>>>>>>>> the debugger
>>>>>>>>>>>>>>> should stop showing exactly where the FPE happens.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If run with -fp_trap -start_in_debugger, I got the following
>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [0]PETSC ERROR: *** unknown floating point error occurred ***
>>>>>>>>>>>>>>> [0]PETSC ERROR: The specific exception can be determined by
>>>>>>>>>>>>>>> running in a
>>>>>>>>>>>>>>> debugger. When the
>>>>>>>>>>>>>>> [0]PETSC ERROR: debugger traps the signal, the exception can
>>>>>>>>>>>>>>> be found with
>>>>>>>>>>>>>>> fetestexcept(0x3d)
>>>>>>>>>>>>>>> [0]PETSC ERROR: where the result is a bitwise OR of the
>>>>>>>>>>>>>>> following flags:
>>>>>>>>>>>>>>> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4
>>>>>>>>>>>>>>> FE_OVERFLOW=0x8
>>>>>>>>>>>>>>> FE_UNDERFLOW=0x10 FE_INEXACT=0x20
>>>>>>>>>>>>>>> [0]PETSC ERROR: Try option -start_in_debugger
>>>>>>>>>>>>>>> [0]PETSC ERROR: likely location of problem given in stack
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>> [0]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>>>>>>>> ------------------------------------
>>>>>>>>>>>>>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack
>>>>>>>>>>>>>>> are not available,
>>>>>>>>>>>>>>> [0]PETSC ERROR: INSTEAD the line number of the start
>>>>>>>>>>>>>>> of the function
>>>>>>>>>>>>>>> [0]PETSC ERROR: is given.
>>>>>>>>>>>>>>> [0]PETSC ERROR: [0] PetscDefaultFPTrap line 379
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/sys/error/fp.c
>>>>>>>>>>>>>>> [0]PETSC ERROR: User provided function() line 0 in Unknown
>>>>>>>>>>>>>>> file trapped
>>>>>>>>>>>>>>> floating point error
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Program received signal SIGABRT: Process abort signal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Backtrace for this error:
>>>>>>>>>>>>>>> #0 0x7F4FEAB1C7D7
>>>>>>>>>>>>>>> #1 0x7F4FEAB1CDDE
>>>>>>>>>>>>>>> #2 0x7F4FE9E1AD3F
>>>>>>>>>>>>>>> #3 0x7F4FE9E1ACC9
>>>>>>>>>>>>>>> #4 0x7F4FE9E1E0D7
>>>>>>>>>>>>>>> #5 0x7F4FEB0B6DCB
>>>>>>>>>>>>>>> #6 0x7F4FEB0B1825
>>>>>>>>>>>>>>> #7 0x7F4FEB0B817F
>>>>>>>>>>>>>>> #8 0x7F4FE9E1AD3F
>>>>>>>>>>>>>>> #9 0x6972C8 in tprfrtlc_ at tprfrtlc.F90:2393
>>>>>>>>>>>>>>> (discriminator 3)
>>>>>>>>>>>>>>> #10 0x4C6C87 in gcreact_ at gcreact.F90:678
>>>>>>>>>>>>>>> #11 0x707E19 in initicrt_ at initicrt.F90:589
>>>>>>>>>>>>>>> #12 0x4F42D0 in initprob_ at initprob.F90:430
>>>>>>>>>>>>>>> #13 0x5AAF72 in driver_pc at driver_pc.F90:438
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I checked the code at tprfrtlc.F90:2393,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> realbuffer_gb(1:nvars) =
>>>>>>>>>>>>>>> (/time,(c(ic),ic=1,nc-1), &
>>>>>>>>>>>>>>> (cx(ix),ix=1,nxout)/)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All the values (time, c, cx) are reasonable, as shown below.
>>>>>>>>>>>>>>> The only
>>>>>>>>>>>>>>> possibility is that realbuffer_gb is in declared as real*4
>>>>>>>>>>>>>>> if using sing
>>>>>>>>>>>>>>> precision output while time, c, cx are declared in real*8. I
>>>>>>>>>>>>>>> have a lot of
>>>>>>>>>>>>>>> similar data conversion from real*8 to real*4 output, other
>>>>>>>>>>>>>>> code does not
>>>>>>>>>>>>>>> return error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> time 0.0000000000000000
>>>>>>>>>>>>>>> c 1 9.9999999999999995E-008
>>>>>>>>>>>>>>> c 2 3.1555251077549618E-003
>>>>>>>>>>>>>>> c 3 7.1657814842179362E-008
>>>>>>>>>>>>>>> c 4 1.0976214263087059E-067
>>>>>>>>>>>>>>> c 5 5.2879822292305797E-004
>>>>>>>>>>>>>>> c 6 9.9999999999999964E-005
>>>>>>>>>>>>>>> c 7 6.4055731968811337E-005
>>>>>>>>>>>>>>> c 8 3.4607572892578404E-020
>>>>>>>>>>>>>>> cx 1 3.4376650636008101E-005
>>>>>>>>>>>>>>> cx 2 7.3989678854017763E-012
>>>>>>>>>>>>>>> cx 3 9.5317170613607207E-012
>>>>>>>>>>>>>>> cx 4 2.2344525794718353E-015
>>>>>>>>>>>>>>> cx 5 3.0624685689695889E-008
>>>>>>>>>>>>>>> cx 6 1.0046157902783967E-007
>>>>>>>>>>>>>>> cx 7 1.5320169154914984E-004
>>>>>>>>>>>>>>> cx 8 8.6930292776346176E-014
>>>>>>>>>>>>>>> cx 9 3.5944267559348721E-005
>>>>>>>>>>>>>>> cx 10 3.0072645866951157E-018
>>>>>>>>>>>>>>> cx 11 2.3592486321095017E-013
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can check all the entries of jacobi matrix to see if the
>>>>>>>>>>>>>>> value is valid,
>>>>>>>>>>>>>>> but this seems not a good idea as it takes a long time to
>>>>>>>>>>>>>>> reach this
>>>>>>>>>>>>>>> point. If I restart the simulation from a specified time
>>>>>>>>>>>>>>> (e.g., 7.685 in
>>>>>>>>>>>>>>> this case), then the error does not occur.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Would you please give me any suggestion on debugging this
>>>>>>>>>>>>>>> case?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Danyang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> timestep: 2730 time: 7.665E+00 years delt: 1.000E-02
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep: 2731 time: 7.675E+00 years delt: 1.000E-02
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep: 2732 time: 7.685E+00 years delt: 1.000E-02
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep: 2733 time: 7.695E+00 years delt: 1.000E-02
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> timestep: 2734 time: 7.705E+00 years delt: 1.000E-02
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> Reduce time step for reactive transport
>>>>>>>>>>>>>>> timestep: 2734 time: 7.700E+00 years delt: 5.000E-03
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> Reduce time step for reactive transport
>>>>>>>>>>>>>>> timestep: 2734 time: 7.697E+00 years delt: 2.500E-03
>>>>>>>>>>>>>>> years iter: 1
>>>>>>>>>>>>>>> timestep: max.sia: 0.000E+00 tol.sia: 0.000E+00
>>>>>>>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>>> [1]PETSC ERROR: Floating point exception
>>>>>>>>>>>>>>> [2]PETSC ERROR: --------------------- Error Message
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>>>>>> [2]PETSC ERROR: Floating point exception
>>>>>>>>>>>>>>> [2]PETSC ERROR: Vec entry at local location 0 is
>>>>>>>>>>>>>>> not-a-number or infinite
>>>>>>>>>>>>>>> at end of function: Parameter number 3
>>>>>>>>>>>>>>> [2]PETSC ERROR: See
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for trouble shooting.
>>>>>>>>>>>>>>> [2]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>>>>>>>>>>>> [2]PETSC ERROR: [1]PETSC ERROR: Vec entry at local location
>>>>>>>>>>>>>>> 0 is
>>>>>>>>>>>>>>> not-a-number or infinite at end of function: Parameter
>>>>>>>>>>>>>>> number 3
>>>>>>>>>>>>>>> [1]PETSC ERROR: See
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for trouble shooting.
>>>>>>>>>>>>>>> [1]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
>>>>>>>>>>>>>>> [1]PETSC ERROR: ../min3p_thcm_petsc_dbg on a linux-gnu-dbg
>>>>>>>>>>>>>>> named nwmop by
>>>>>>>>>>>>>>> dsu Thu Apr 23 15:38:52 2015
>>>>>>>>>>>>>>> [1]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>>>>>>>>>>>> --with-cc=gcc
>>>>>>>>>>>>>>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>>>>>>>>>>>> --download-mpich
>>>>>>>>>>>>>>> --download-mumps --download-hypre --download-superlu_dist
>>>>>>>>>>>>>>> --download-metis
>>>>>>>>>>>>>>> --download-parmetis --download-scalapack
>>>>>>>>>>>>>>> [1]PETSC ERROR: #1 VecValidValues() line 34 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>>>>>>>>>>>>>>> ../min3p_thcm_petsc_dbg on a linux-gnu-dbg named nwmop by
>>>>>>>>>>>>>>> dsu Thu Apr 23
>>>>>>>>>>>>>>> 15:38:52 2015
>>>>>>>>>>>>>>> [2]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu-dbg
>>>>>>>>>>>>>>> --with-cc=gcc
>>>>>>>>>>>>>>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack
>>>>>>>>>>>>>>> --download-mpich
>>>>>>>>>>>>>>> --download-mumps --download-hypre --download-superlu_dist
>>>>>>>>>>>>>>> --download-metis
>>>>>>>>>>>>>>> --download-parmetis --download-scalapack
>>>>>>>>>>>>>>> [2]PETSC ERROR: #1 VecValidValues() line 34 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/vec/vec/interface/rvector.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: [1]PETSC ERROR: #2 PCApply() line 442 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: #2 PCApply() line 442 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/pc/interface/precon.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #3 KSP_PCApply() line 230 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>>>>>>>>>>>>>>> #3 KSP_PCApply() line 230 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/include/petsc-private/kspimpl.h
>>>>>>>>>>>>>>> [1]PETSC ERROR: #4 KSPInitialResidual() line 63 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #4 KSPInitialResidual() line 63 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itres.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #5 KSPSolve_GMRES() line 234 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/impls/gmres/gmres.c
>>>>>>>>>>>>>>> [2]PETSC ERROR: #6 KSPSolve() line 459 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>>>>>>>>>>>>>>> [1]PETSC ERROR: #6 KSPSolve() line 459 in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/dsu/Soft/PETSc/petsc-3.5.2/src/ksp/ksp/interface/itfunc.c
>>>>>>>>>>>>>>> ^C[mpiexec at nwmop] Sending Ctrl-C to processes as requested
>>>>>>>>>>>>>>> [mpiexec at nwmop] Press Ctrl-C again to force abort
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <petscconf.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150430/9749740d/attachment-0001.html>
More information about the petsc-users
mailing list