[petsc-users] Debugging suggestions: GAMG
Sanjay Govindjee
s_g at berkeley.edu
Sat Jun 13 14:34:00 CDT 2020
Machine details:
Fedora Core 30: 5.6.13-100.fc30.x86_64
gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
GNU Fortran (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
Lapack/BLAS: Are whatever came with the machine and are in /usr/lib64.
I did not compile them myself
I'll try two things: (1) Rebuil with a different BLAS/LAPACK and (2) set
a stop in ieee_handler( ) to see when and where it is getting called.
Also just for completeness here are the rest of the error messages from
the run:
Thread 1 "feap" received signal SIGFPE, Arithmetic exception.
0x00007f0fe77e5be1 in ieeeck_ () from /lib64/liblapack.so.3
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
Exception,probably divide by zero
[0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
[0]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
available,
[0]PETSC ERROR: INSTEAD the line number of the start of the
function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: [0] LAPACKgesvd line 32
/home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c
[0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 14
/home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c
[0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 57
/home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: [0] PCGAMGOptProlongator_AGG line 1107
/home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c
[0]PETSC ERROR: User provided function() line 0 in unknown file (null)
Program received signal SIGABRT: Process abort signal.
On 6/13/20 9:04 AM, Barry Smith wrote:
>
> The LAPACK routine ieeeck_ intentionally does a divide by zero to
> check if the system can handle it without generating an exception. It
> doesn't have anything to do
> with the particular matrix data passed to LAPACK.
>
> In KSPComputeExtremeSingularValues_GMRES() we have the code structure
>
> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr);
> #if !defined(PETSC_USE_COMPLEX)
> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
> #else
> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,realpart+N,&lierr));
> #endif
> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD
> Lapack routine %d",(int)lierr);
> ierr = PetscFPTrapPop();CHKERRQ(ierr);
>
> So PETSc tries to turn off trapping of floating point exceptions
> before calling the LAPACK routines that eventually lead to the exception.
>
> PetscErrorCode PetscFPTrapPush(PetscFPTrap trap)
> {
> PetscErrorCode ierr;
> struct PetscFPTrapLink *link;
>
> PetscFunctionBegin;
> ierr = PetscNew(&link);CHKERRQ(ierr);
> link->trapmode = _trapmode;
> link->next = _trapstack;
> _trapstack = link;
> if (trap != _trapmode) {ierr = PetscSetFPTrap(trap);CHKERRQ(ierr);}
> PetscFunctionReturn(0);
> }
>
> PetscErrorCode PetscSetFPTrap(PetscFPTrap flag)
> {
> char *out;
>
> PetscFunctionBegin;
> /* Clear accumulated exceptions. Used to suppress meaningless
> messages from f77 programs */
> (void) ieee_flags("clear","exception","all",&out);
> if (flag == PETSC_FP_TRAP_ON) {
> /*
> To trap more fp exceptions, including underflow, change the line
> below to
> if (ieee_handler("set","all",PetscDefaultFPTrap)) {
> */
> if (ieee_handler("set","common",PetscDefaultFPTrap))
> (*PetscErrorPrintf)("Can't set floatingpoint handler\n");
> } else if (ieee_handler("clear","common",PetscDefaultFPTrap))
> (*PetscErrorPrintf)("Can't clear floatingpoint handler\n");
>
> _trapmode = flag;
> PetscFunctionReturn(0);
> }
>
> So either the ieee_handler clear is not working for your system or
> some other code, AFTER PETSc calls ieee_handler sets the ieee_handler
> to trap divide by zero.
>
> A git grep -i ieee_handler shows that the reference BLAS/LAPACK and
> OpenBLAS never seem to call the ieee_handler.
>
> We need to know what lapack/blas you are using and how they were
> compiled.
>
> Some Fortran compilers/linkers set nonstandard exception handlers,
> but since PETSc clears them I don't know how they could get set again
>
> You could try in gdb to put a break point in ieee_handler and find
> all the places it gets called, maybe this will lead to the location of
> the cause.
>
> Barry
>
>
>> On Jun 13, 2020, at 1:30 AM, Sanjay Govindjee <s_g at berkeley.edu
>> <mailto:s_g at berkeley.edu>> wrote:
>>
>> I have a FEA problem that I am trying to solve with GAMG. The
>> problem solves
>> just fine with direct solvers (mumps, superlu) and iterative solvers
>> (gmres, ml, hypre-boomer) etc.
>>
>> However with GAMG I am getting a divide by zero that I am having
>> trouble tracking down. Below
>> is the gdb stack trace and the source lines going up the stack.
>>
>> When I run in valgrind the problem runs fine (and gets the correct
>> answer).
>> Valgrind reports nothing of note (just lots of indirectly lost
>> blocks related to PMP_INIT).
>>
>> I'm only running on one processor.
>>
>> Any suggestions on where to start to trace the problem?
>>
>> -sanjay
>>
>> #0 0x00007fb262dc5be1 in ieeeck_ () from /lib64/liblapack.so.3
>> #1 0x00007fb262dc5332 in ilaenv_ () from /lib64/liblapack.so.3
>> #2 0x00007fb262dbbcef in dlasq2_ () from /lib64/liblapack.so.3
>> #3 0x00007fb262dbb78c in dlasq1_ () from /lib64/liblapack.so.3
>> #4 0x00007fb262da1e2e in dbdsqr_ () from /lib64/liblapack.so.3
>> #5 0x00007fb262960110 in dgesvd_ () from /lib64/liblapack.so.3
>> #6 0x00007fb264e74b66 in KSPComputeExtremeSingularValues_GMRES
>> (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c:32
>> #7 0x00007fb264dfe69a in KSPComputeExtremeSingularValues
>> (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:64
>> #8 0x00007fb264b44a1f in PCGAMGOptProlongator_AGG (pc=0x12f3d30,
>> Amat=0x11a2630, a_P=0x7ffc5010ebe0) at
>> /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c:1145
>> #9 0x00007fb264b248a1 in PCSetUp_GAMG (pc=0x12f3d30) at
>> /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/gamg.c:557
>> #10 0x00007fb264d8535b in PCSetUp (pc=0x12f3d30) at
>> /home/sg/petsc-3.13.2/src/ksp/pc/interface/precon.c:898
>> #11 0x00007fb264e01a93 in KSPSetUp (ksp=0x128dd80) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:376
>> #12 0x00007fb264e057af in KSPSolve_Private (ksp=0x128dd80,
>> b=0x1259f30, x=0x125d910) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:633
>> #13 0x00007fb264e086b9 in KSPSolve (ksp=0x128dd80, b=0x1259f30,
>> x=0x125d910) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:853
>> #14 0x00007fb264e46216 in kspsolve_ (ksp=0x832670
>> <__pfeapc_MOD_kspsol>, b=0x832698 <__pfeapc_MOD_rhs>, x=0x8326a0
>> <__pfeapc_MOD_sol>, __ierr=0x7ffc5010f358)
>> at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/ftn-auto/itfuncf.c:266
>> #15 0x000000000043298d in usolve (flags=..., b=...) at usolve.F:313
>> #16 0x000000000044afba in psolve (stype=-3, b=..., fp=...,
>> factor=.TRUE., solve=.TRUE., cfr=.FALSE., prnt=.TRUE.) at
>> psolve.f:212
>> #17 0x00000000006b7393 in pmacr1 (lct=..., ct=..., j=3,
>> _lct=_lct at entry=15) at pmacr1.f:578
>> #18 0x00000000005c247b in pmacr (initf=.FALSE.) at pmacr.f:578
>> #19 0x000000000044ff20 in pcontr () at pcontr.f:1307
>> #20 0x0000000000404d9b in feap () at feap86.f:162
>> #21 main (argc=<optimized out>, argv=<optimized out>) at feap86.f:168
>> #22 0x00007fb261aaef43 in __libc_start_main () from /lib64/libc.so.6
>> #23 0x0000000000404dde in _start ()
>>
>> (gdb) list
>> 1 <built-in>: No such file or directory.
>> (gdb) up
>> #1 0x00007fb262dc5332 in ilaenv_ () from /lib64/liblapack.so.3
>> (gdb) up
>> #2 0x00007fb262dbbcef in dlasq2_ () from /lib64/liblapack.so.3
>> (gdb) up
>> #3 0x00007fb262dbb78c in dlasq1_ () from /lib64/liblapack.so.3
>> (gdb) up
>> #4 0x00007fb262da1e2e in dbdsqr_ () from /lib64/liblapack.so.3
>> (gdb) up
>> #5 0x00007fb262960110 in dgesvd_ () from /lib64/liblapack.so.3
>> (gdb) up
>> #6 0x00007fb264e74b66 in KSPComputeExtremeSingularValues_GMRES
>> (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c:32
>> 32
>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
>> (gdb) up
>> #7 0x00007fb264dfe69a in KSPComputeExtremeSingularValues
>> (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:64
>> 64 ierr =
>> (*ksp->ops->computeextremesingularvalues)(ksp,emax,emin);CHKERRQ(ierr);
>> (gdb) up
>> #8 0x00007fb264b44a1f in PCGAMGOptProlongator_AGG (pc=0x12f3d30,
>> Amat=0x11a2630, a_P=0x7ffc5010ebe0) at
>> /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c:1145
>> 1145 ierr = KSPComputeExtremeSingularValues(eksp, &emax,
>> &emin);CHKERRQ(ierr);
>> (gdb) up
>> #9 0x00007fb264b248a1 in PCSetUp_GAMG (pc=0x12f3d30) at
>> /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/gamg.c:557
>> 557 ierr = pc_gamg->ops->optprolongator(pc,
>> Aarr[level], &Prol11);CHKERRQ(ierr);
>> (gdb) up
>> #10 0x00007fb264d8535b in PCSetUp (pc=0x12f3d30) at
>> /home/sg/petsc-3.13.2/src/ksp/pc/interface/precon.c:898
>> 898 ierr = (*pc->ops->setup)(pc);CHKERRQ(ierr);
>> (gdb) up
>> #11 0x00007fb264e01a93 in KSPSetUp (ksp=0x128dd80) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:376
>> 376 ierr = PCSetUp(ksp->pc);CHKERRQ(ierr);
>> (gdb) up
>> #12 0x00007fb264e057af in KSPSolve_Private (ksp=0x128dd80,
>> b=0x1259f30, x=0x125d910) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:633
>> 633 ierr = KSPSetUp(ksp);CHKERRQ(ierr);
>> (gdb) up
>> #13 0x00007fb264e086b9 in KSPSolve (ksp=0x128dd80, b=0x1259f30,
>> x=0x125d910) at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:853
>> 853 ierr = KSPSolve_Private(ksp,b,x);CHKERRQ(ierr);
>> (gdb) up
>> #14 0x00007fb264e46216 in kspsolve_ (ksp=0x832670
>> <__pfeapc_MOD_kspsol>, b=0x832698 <__pfeapc_MOD_rhs>, x=0x8326a0
>> <__pfeapc_MOD_sol>, __ierr=0x7ffc5010f358)
>> at
>> /home/sg/petsc-3.13.2/src/ksp/ksp/interface/ftn-auto/itfuncf.c:266
>> 266 *__ierr = KSPSolve(
>> (gdb) up
>> #15 0x000000000043298d in usolve (flags=..., b=...) at usolve.F:313
>> 313 call KSPSolve (kspsol, rhs, sol, ierr)
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200613/21120774/attachment-0001.html>
More information about the petsc-users
mailing list