[petsc-users] Debugging suggestions: GAMG

Sanjay Govindjee s_g at berkeley.edu
Sat Jun 13 14:34:00 CDT 2020


Machine details:
      Fedora Core 30:  5.6.13-100.fc30.x86_64
      gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
      GNU Fortran (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

Lapack/BLAS: Are whatever came with the machine and are in /usr/lib64.  
I did not compile them myself

I'll try two things: (1) Rebuil with a different BLAS/LAPACK and (2) set 
a stop in ieee_handler( ) to see when and where it is getting called.

Also just for completeness here are the rest of the error messages from 
the run:

    Thread 1 "feap" received signal SIGFPE, Arithmetic exception.
    0x00007f0fe77e5be1 in ieeeck_ () from /lib64/liblapack.so.3


    [0]PETSC ERROR:
    ------------------------------------------------------------------------
    [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
    Exception,probably divide by zero
    [0]PETSC ERROR: Try option -start_in_debugger or
    -on_error_attach_debugger
    [0]PETSC ERROR: or see
    https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
    [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
    Mac OS X to find memory corruption errors
    [0]PETSC ERROR: likely location of problem given in stack below
    [0]PETSC ERROR: ---------------------  Stack Frames
    ------------------------------------
    [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
    available,
    [0]PETSC ERROR:       INSTEAD the line number of the start of the
    function
    [0]PETSC ERROR:       is given.
    [0]PETSC ERROR: [0] LAPACKgesvd line 32
    /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c
    [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 14
    /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c
    [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 57
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c
    [0]PETSC ERROR: [0] PCGAMGOptProlongator_AGG line 1107
    /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c
    [0]PETSC ERROR: User provided function() line 0 in  unknown file (null)

    Program received signal SIGABRT: Process abort signal.


On 6/13/20 9:04 AM, Barry Smith wrote:
>
>    The LAPACK routine ieeeck_ intentionally does a divide by zero to 
> check if the system can handle it without generating an exception. It 
> doesn't have anything to do
> with the particular matrix data passed to LAPACK.
>
>     In KSPComputeExtremeSingularValues_GMRES() we have the code structure
>
>   ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr);
> #if !defined(PETSC_USE_COMPLEX)
> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
> #else
> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,realpart+N,&lierr));
> #endif
>   if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD 
> Lapack routine %d",(int)lierr);
>   ierr = PetscFPTrapPop();CHKERRQ(ierr);
>
>    So PETSc tries to turn off trapping of floating point exceptions 
> before calling the LAPACK routines that eventually lead to the exception.
>
> PetscErrorCode PetscFPTrapPush(PetscFPTrap trap)
> {
>   PetscErrorCode         ierr;
>   struct PetscFPTrapLink *link;
>
>   PetscFunctionBegin;
>   ierr           = PetscNew(&link);CHKERRQ(ierr);
>   link->trapmode = _trapmode;
>   link->next     = _trapstack;
>   _trapstack     = link;
>   if (trap != _trapmode) {ierr = PetscSetFPTrap(trap);CHKERRQ(ierr);}
>   PetscFunctionReturn(0);
> }
>
> PetscErrorCode PetscSetFPTrap(PetscFPTrap flag)
> {
>   char *out;
>
>   PetscFunctionBegin;
>   /* Clear accumulated exceptions.  Used to suppress meaningless 
> messages from f77 programs */
>   (void) ieee_flags("clear","exception","all",&out);
>   if (flag == PETSC_FP_TRAP_ON) {
>     /*
>       To trap more fp exceptions, including underflow, change the line 
> below to
>       if (ieee_handler("set","all",PetscDefaultFPTrap)) {
>     */
>     if (ieee_handler("set","common",PetscDefaultFPTrap)) 
>  (*PetscErrorPrintf)("Can't set floatingpoint handler\n");
>   } else if (ieee_handler("clear","common",PetscDefaultFPTrap)) 
> (*PetscErrorPrintf)("Can't clear floatingpoint handler\n");
>
>   _trapmode = flag;
>   PetscFunctionReturn(0);
> }
>
>   So either the ieee_handler clear is not working for your system or 
> some other code, AFTER PETSc calls ieee_handler sets the  ieee_handler 
> to trap divide by zero.
>
>   A git grep -i ieee_handler  shows that the reference BLAS/LAPACK and 
> OpenBLAS never seem to call the ieee_handler.
>
>   We need to know what lapack/blas you are using and how they were 
> compiled.
>
>   Some Fortran compilers/linkers set nonstandard exception handlers, 
> but since PETSc clears them I don't know how they could get set again
>
>   You could try in gdb to put a break point in ieee_handler and find 
> all the places it gets called, maybe this will lead to the location of 
> the cause.
>
>   Barry
>
>
>> On Jun 13, 2020, at 1:30 AM, Sanjay Govindjee <s_g at berkeley.edu 
>> <mailto:s_g at berkeley.edu>> wrote:
>>
>> I have a FEA problem that I am trying to solve with GAMG.  The 
>> problem solves
>> just fine with direct solvers (mumps, superlu) and iterative solvers 
>> (gmres, ml, hypre-boomer) etc.
>>
>> However with GAMG I am getting a divide by zero that I am having 
>> trouble tracking down.  Below
>> is the gdb stack trace and the source lines going up the stack.
>>
>> When I run in valgrind the problem runs fine (and gets the correct 
>> answer).
>> Valgrind reports nothing of note (just lots of indirectly lost 
>> blocks  related to PMP_INIT).
>>
>> I'm only running on one processor.
>>
>> Any suggestions on where to start to trace the problem?
>>
>> -sanjay
>>
>>     #0  0x00007fb262dc5be1 in ieeeck_ () from /lib64/liblapack.so.3
>>     #1  0x00007fb262dc5332 in ilaenv_ () from /lib64/liblapack.so.3
>>     #2  0x00007fb262dbbcef in dlasq2_ () from /lib64/liblapack.so.3
>>     #3  0x00007fb262dbb78c in dlasq1_ () from /lib64/liblapack.so.3
>>     #4  0x00007fb262da1e2e in dbdsqr_ () from /lib64/liblapack.so.3
>>     #5  0x00007fb262960110 in dgesvd_ () from /lib64/liblapack.so.3
>>     #6  0x00007fb264e74b66 in KSPComputeExtremeSingularValues_GMRES
>>     (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c:32
>>     #7  0x00007fb264dfe69a in KSPComputeExtremeSingularValues
>>     (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:64
>>     #8  0x00007fb264b44a1f in PCGAMGOptProlongator_AGG (pc=0x12f3d30,
>>     Amat=0x11a2630, a_P=0x7ffc5010ebe0) at
>>     /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c:1145
>>     #9  0x00007fb264b248a1 in PCSetUp_GAMG (pc=0x12f3d30) at
>>     /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/gamg.c:557
>>     #10 0x00007fb264d8535b in PCSetUp (pc=0x12f3d30) at
>>     /home/sg/petsc-3.13.2/src/ksp/pc/interface/precon.c:898
>>     #11 0x00007fb264e01a93 in KSPSetUp (ksp=0x128dd80) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:376
>>     #12 0x00007fb264e057af in KSPSolve_Private (ksp=0x128dd80,
>>     b=0x1259f30, x=0x125d910) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:633
>>     #13 0x00007fb264e086b9 in KSPSolve (ksp=0x128dd80, b=0x1259f30,
>>     x=0x125d910) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:853
>>     #14 0x00007fb264e46216 in kspsolve_ (ksp=0x832670
>>     <__pfeapc_MOD_kspsol>, b=0x832698 <__pfeapc_MOD_rhs>, x=0x8326a0
>>     <__pfeapc_MOD_sol>, __ierr=0x7ffc5010f358)
>>         at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/ftn-auto/itfuncf.c:266
>>     #15 0x000000000043298d in usolve (flags=..., b=...) at usolve.F:313
>>     #16 0x000000000044afba in psolve (stype=-3, b=..., fp=...,
>>     factor=.TRUE., solve=.TRUE., cfr=.FALSE., prnt=.TRUE.) at
>>     psolve.f:212
>>     #17 0x00000000006b7393 in pmacr1 (lct=..., ct=..., j=3,
>>     _lct=_lct at entry=15) at pmacr1.f:578
>>     #18 0x00000000005c247b in pmacr (initf=.FALSE.) at pmacr.f:578
>>     #19 0x000000000044ff20 in pcontr () at pcontr.f:1307
>>     #20 0x0000000000404d9b in feap () at feap86.f:162
>>     #21 main (argc=<optimized out>, argv=<optimized out>) at feap86.f:168
>>     #22 0x00007fb261aaef43 in __libc_start_main () from /lib64/libc.so.6
>>     #23 0x0000000000404dde in _start ()
>>
>>     (gdb) list
>>     1       <built-in>: No such file or directory.
>>     (gdb) up
>>     #1  0x00007fb262dc5332 in ilaenv_ () from /lib64/liblapack.so.3
>>     (gdb) up
>>     #2  0x00007fb262dbbcef in dlasq2_ () from /lib64/liblapack.so.3
>>     (gdb) up
>>     #3  0x00007fb262dbb78c in dlasq1_ () from /lib64/liblapack.so.3
>>     (gdb) up
>>     #4  0x00007fb262da1e2e in dbdsqr_ () from /lib64/liblapack.so.3
>>     (gdb) up
>>     #5  0x00007fb262960110 in dgesvd_ () from /lib64/liblapack.so.3
>>     (gdb) up
>>     #6  0x00007fb264e74b66 in KSPComputeExtremeSingularValues_GMRES
>>     (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c:32
>>     32
>>     PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
>>     (gdb) up
>>     #7  0x00007fb264dfe69a in KSPComputeExtremeSingularValues
>>     (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:64
>>     64          ierr =
>>     (*ksp->ops->computeextremesingularvalues)(ksp,emax,emin);CHKERRQ(ierr);
>>     (gdb) up
>>     #8  0x00007fb264b44a1f in PCGAMGOptProlongator_AGG (pc=0x12f3d30,
>>     Amat=0x11a2630, a_P=0x7ffc5010ebe0) at
>>     /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c:1145
>>     1145          ierr = KSPComputeExtremeSingularValues(eksp, &emax,
>>     &emin);CHKERRQ(ierr);
>>     (gdb) up
>>     #9  0x00007fb264b248a1 in PCSetUp_GAMG (pc=0x12f3d30) at
>>     /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/gamg.c:557
>>     557               ierr = pc_gamg->ops->optprolongator(pc,
>>     Aarr[level], &Prol11);CHKERRQ(ierr);
>>     (gdb) up
>>     #10 0x00007fb264d8535b in PCSetUp (pc=0x12f3d30) at
>>     /home/sg/petsc-3.13.2/src/ksp/pc/interface/precon.c:898
>>     898         ierr = (*pc->ops->setup)(pc);CHKERRQ(ierr);
>>     (gdb) up
>>     #11 0x00007fb264e01a93 in KSPSetUp (ksp=0x128dd80) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:376
>>     376       ierr = PCSetUp(ksp->pc);CHKERRQ(ierr);
>>     (gdb) up
>>     #12 0x00007fb264e057af in KSPSolve_Private (ksp=0x128dd80,
>>     b=0x1259f30, x=0x125d910) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:633
>>     633       ierr = KSPSetUp(ksp);CHKERRQ(ierr);
>>     (gdb) up
>>     #13 0x00007fb264e086b9 in KSPSolve (ksp=0x128dd80, b=0x1259f30,
>>     x=0x125d910) at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:853
>>     853       ierr = KSPSolve_Private(ksp,b,x);CHKERRQ(ierr);
>>     (gdb) up
>>     #14 0x00007fb264e46216 in kspsolve_ (ksp=0x832670
>>     <__pfeapc_MOD_kspsol>, b=0x832698 <__pfeapc_MOD_rhs>, x=0x8326a0
>>     <__pfeapc_MOD_sol>, __ierr=0x7ffc5010f358)
>>         at
>>     /home/sg/petsc-3.13.2/src/ksp/ksp/interface/ftn-auto/itfuncf.c:266
>>     266     *__ierr = KSPSolve(
>>     (gdb) up
>>     #15 0x000000000043298d in usolve (flags=..., b=...) at usolve.F:313
>>     313               call KSPSolve         (kspsol, rhs, sol, ierr)
>>
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200613/21120774/attachment-0001.html>


More information about the petsc-users mailing list