[petsc-users] divide by zero coming in hypre

Barry Smith bsmith at mcs.anl.gov
Tue Apr 28 15:11:59 CDT 2015



  Trying again, hyper-support rejected my first email do to something in the subject line.


 I am forwarding this on to the hypre support email list; hopefully they =
can have some advice on how to proceed.=20

   hypre folks, this is with version hypre-2.10.0b (the previous version ha=
d the same behavior). We are getting a divide by zero in gselim() line 4363=
 (this happens in a time dependent problem after many successful solves wit=
h time dependent matrix).  I looked at the code, below, and note that there=
 are some checks for the diagonal of A not being zero but not for the line =
that causes the divide by zero; is this perhaps an oversight in the hypre c=
ode?  Any advice appreciated.

   Thanks

   Barry




HYPRE_Int gselim(A,x,n)
HYPRE_Real *A;
HYPRE_Real *x;
HYPRE_Int n;
{
   HYPRE_Int    err_flag =3D 0;
   HYPRE_Int    j,k,m;
   HYPRE_Real factor;
  =20
   if (n=3D=3D1)                           /* A is 1x1 */ =20
   {
      if (A[0] !=3D 0.0)
      {
         x[0] =3D x[0]/A[0];
         return(err_flag);
      }
      else
      {
         err_flag =3D 1;
         return(err_flag);
      }
   }
   else                               /* A is nxn.  Forward elimination */=
=20
   {
      for (k =3D 0; k < n-1; k++)
      {
          if (A[k*n+k] !=3D 0.0)
          {         =20
             for (j =3D k+1; j < n; j++)
             {
                 if (A[j*n+k] !=3D 0.0)
                 {
                    factor =3D A[j*n+k]/A[k*n+k];
                    for (m =3D k+1; m < n; m++)
                    {
                        A[j*n+m]  -=3D factor * A[k*n+m];
                    }
                                     /* Elimination step for rhs */=20
                    x[j] -=3D factor * x[k];             =20
                 }
             }
          }
       }
                                    /* Back Substitution  */
       for (k =3D n-1; k > 0; --k)
       {
           x[k] /=3D A[k*n+k];
           for (j =3D 0; j < k; j++)
           {
               if (A[j*n+k] !=3D 0.0)
               {
                  x[j] -=3D x[k] * A[j*n+k];
               }
           }
       }
       x[0] /=3D A[0];
       return(err_flag);
    }
}


> On Apr 28, 2015, at 12:55 PM, Danyang Su <danyang.su at gmail.com> wrote:
>=20
> Hi Barry,
>=20
> The development version of PETSc does not help to solve my problem. It st=
ill crashed due to the same error information.
>=20
> As Matthew mentioned, I checked the matrix and non of the diagonal entry =
is zero. But the diagonal values for different rows range from -1.0d-18 to =
1.0d33 and the matrix is not strict diagonal dominant. When using 1 process=
or, the matrix is also not strict diagonal dominant, but the diagonal value=
s have a much smaller range and the codes can run without error. Maybe ther=
e is something wrong in the matrix for this case and I need to check the ma=
trix first.
>=20
> Thanks,
>=20
> Danyang
>=20
> Program received signal SIGFPE, Arithmetic exception.
> 0x00007f668c1299d5 in gselim (A=3D0x2fe3330, x=3D0x2210680, n=3D9)
>    at par_relax.c:4363
> 4363               x[k] /=3D A[k*n+k];
> (gdb)
>=20
> [1]PETSC ERROR: *** unknown floating point error occurred ***
> [1]PETSC ERROR: The specific exception can be determined by running in a =
debugger.  When the
> [1]PETSC ERROR: debugger traps the signal, the exception can be found wit=
h fetestexcept(0x3d)
> [1]PETSC ERROR: where the result is a bitwise OR of the following flags:
> [1]PETSC ERROR: FE_INVALID=3D0x1 FE_DIVBYZERO=3D0x4 FE_OVERFLOW=3D0x8 FE_=
UNDERFLOW=3D0x10 FE_INEXACT=3D0x20
> [1]PETSC ERROR: Try option -start_in_debugger
> [1]PETSC ERROR: likely location of problem given in stack below
> [1]PETSC ERROR: ---------------------  Stack Frames ---------------------=
---------------
> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not availab=
le,
> [1]PETSC ERROR:       INSTEAD the line number of the start of the functio=
n
> [1]PETSC ERROR:       is given.
> [1]PETSC ERROR: [1] PetscDefaultFPTrap line 379 /home/dsu/Soft/PETSc/pets=
c-dev/src/sys/error/fp.c
> [1]PETSC ERROR: [1] Hypre solve line 216 /home/dsu/Soft/PETSc/petsc-dev/s=
rc/ksp/pc/impls/hypre/hypre.c
> [1]PETSC ERROR: [1] PCApply_HYPRE line 203 /home/dsu/Soft/PETSc/petsc-dev=
/src/ksp/pc/impls/hypre/hypre.c
> [1]PETSC ERROR: [1] PCApply line 430 /home/dsu/Soft/PETSc/petsc-dev/src/k=
sp/pc/interface/precon.c
> [1]PETSC ERROR: [1] KSP_PCApply line 234 /home/dsu/Soft/PETSc/petsc-dev/i=
nclude/petsc/private/kspimpl.h
> [1]PETSC ERROR: [1] KSPInitialResidual line 44 /home/dsu/Soft/PETSc/petsc=
-dev/src/ksp/ksp/interface/itres.c
> [1]PETSC ERROR: [1] KSPSolve_GMRES line 224 /home/dsu/Soft/PETSc/petsc-de=
v/src/ksp/ksp/impls/gmres/gmres.c
> [1]PETSC ERROR: [1] KSPSolve line 506 /home/dsu/Soft/PETSc/petsc-dev/src/=
ksp/ksp/interface/itfunc.c
> [1]PETSC ERROR: User provided function() line 0 in Unknown file trapped f=
loating point error
>=20
>=20
>=20
> On 15-04-27 11:47 AM, Barry Smith wrote:
>>    Danyang,
>>=20
>>     I'm glad you were finally able to get to the problematic point; sorr=
y it took so long (I want to strangle whoever turned on catching of underfl=
ows in PETSc).
>>=20
>>     The hypre folks have done a great deal of work on their relaxation c=
ode since the PETSc 3.5.3 release. There is a good chance they have fixed t=
he divide by zero problem you are hitting here. You will need to upgrade to=
 the development version of PETSc (that uses the latest version of hypre), =
here are the instructions on how to obtain it http://www.mcs.anl.gov/petsc/=
developers/index.html
>>=20
>>    Please let us know if this resolves the problem with hypre failing.
>>=20
>>   Barry
>>=20
>>=20
>>=20
>>> On Apr 27, 2015, at 11:44 AM, Danyang Su <danyang.su at gmail.com> wrote:
>>>=20
>>> Hi Barry,
>>>=20
>>> I got the following arithemetic exception after the previous bug is fix=
ed.
>>>=20
>>> Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
>>> 0x00007f3b23b98f20 in __nanosleep_nocancel ()
>>>    at ../sysdeps/unix/syscall-template.S:81
>>> 81      ../sysdeps/unix/syscall-template.S: No such file or directory.
>>> (gdb) cont
>>> Continuing.
>>>=20
>>> Program received signal SIGFPE, Arithmetic exception.
>>> 0x00007f3b260df449 in gselim (A=3D0x1e4c580, x=3D0x1d30c40, n=3D9)
>>>    at par_relax.c:3442
>>> 3442               x[k] /=3D A[k*n+k];
>>> (gdb)
>>>=20
>>> I tried both PETSc 3.5.2 and 3.5.3 and they return the same error as sh=
own above. For 3.5.3, i edited fp.c file and then configure and make.
>>>=20
>>> Thanks,
>>>=20


More information about the petsc-users mailing list