[petsc-users] segfault after recent scientific linux upgrade

Klaij, Christiaan C.Klaij at marin.nl
Thu Dec 7 09:15:00 CST 2017


Satish,

As a first try, I've kept petsc-3.7.5 and only replaced superlu
by the new xsdk-0.2.0-rc1 version. Unfortunately, this doesn't
fix the problem, see the backtrace below.

Fande,

Perhaps the problem is related to petsc, not superlu?

What really puzzles me is that everything was working fine with
petsc-3.7.5 and superlu_dist_5.3.1, it only broke after we
updated Scientific Linux 7. So this bug (in petsc or in superlu)
was already there but somehow not triggered before the SL7
update?

Chris

(gdb) bt
#0  0x00002b38995fa30c in mc64wd_dist (n=0x3da6230, ne=0x2, ip=0x1,
    irn=0x3d424e0, a=0x3d82220, iperm=0x1000, num=0x7ffc505dd294,
    jperm=0x3d7a220, out=0x3d7e220, pr=0x3d82220, q=0x3d86220, l=0x3d8a220,
    u=0x3d8e230, d__=0x3d96230)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/mc64ad_dist.c:2322
#1  0x00002b38995f5f7b in mc64ad_dist (job=0x3da6230, n=0x2, ne=0x1,
    ip=0x3d424e0, irn=0x3d82220, a=0x1000, num=0x7ffc505dd2b0,
    cperm=0x3d8e230, liw=0x3d1acd0, iw=0x3d560f0, ldw=0x3d424e0, dw=0x3d0e530,
    icntl=0x3d7a220, info=0x2b3899615546 <dldperm_dist+614>)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/mc64ad_dist.c:596
#2  0x00002b3899615546 in dldperm_dist (job=0, n=0, nnz=0, colptr=0x3d424e0,
    adjncy=0x3d82220, nzval=0x1000, perm=0x4f00, u=0x1000, v=0x3d0e001)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/dldperm_dist.c:141
#3  0x00002b389960d286 in pdgssvx_ABglobal (options=0x3da6230, A=0x2,
    ScalePermstruct=0x1, B=0x3d424e0, ldb=64496160, nrhs=4096, grid=0x3d009f0,
    LUstruct=0x3d0df00, berr=0x1000,
    stat=0x2b389851da7d <MatLUFactorNumeric_SuperLU_DIST+2349>, info=0x3d0df18)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/pdgssvx_ABglobal.c:716
#4  0x00002b389851da7d in MatLUFactorNumeric_SuperLU_DIST (F=0x3da6230, A=0x2,
---Type <return> to continue, or q <return> to quit---
    info=0x1)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:419
#5  0x00002b389852ca1a in MatLUFactorNumeric (fact=0x3da6230, mat=0x2,
    info=0x1)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/interface/matrix.c:2996
#6  0x00002b38988856c7 in PCSetUp_LU (pc=0x3da6230)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/factor/lu/lu.c:172
#7  0x00002b38987d4084 in PCSetUp (pc=0x3da6230)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:968
#8  0x00002b389891068d in KSPSetUp (ksp=0x3da6230)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:390
#9  0x00002b389890c7be in KSPSolve (ksp=0x3da6230, b=0x2, x=0x2d18d90)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:599
#10 0x00002b3898925142 in kspsolve_ (ksp=0x3da6230, b=0x2, x=0x1,
    __ierr=0x3d424e0)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
---Type <return> to continue, or q <return> to quit---
#11 0x0000000000bccf71 in petsc_solvers::petsc_solvers_solve (
    regname='massTransport', rhs_c=..., phi_c=..., tol=0.01, maxiter=500,
    res0=-9.2559631349317831e+61, usediter=0, .tmp.REGNAME.len_V$1790=13)
    at petsc_solvers.F90:580
#12 0x0000000000c2c9c5 in mass_momentum::mass_momentum_pressureprediction ()
    at mass_momentum.F90:989
#13 0x0000000000c0ffc1 in mass_momentum::mass_momentum_core ()
    at mass_momentum.F90:626
#14 0x0000000000c26a2c in mass_momentum::mass_momentum_systempcapply (
    aa_system=54952496, xx_system=47570896, rr_system=47572416, ierr=0)
    at mass_momentum.F90:919
#15 0x00002b3898891763 in ourshellapply (pc=0x3468230, x=0x2d5dfd0,
    y=0x2d5e5c0)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/ftn-custom/zshellpcf.c:41
#16 0x00002b389888e9be in PCApply_Shell (pc=0x3da6230, x=0x2, y=0x1)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/shellpc.c:124
#17 0x00002b38987d8800 in PCApply (pc=0x3da6230, x=0x2, y=0x1)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:482
#18 0x00002b389890c92a in KSPSolve (ksp=0x3da6230, b=0x2, x=0x2d5e5c0)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interf---Type <return> to continue, or q <return> to quit---
ace/itfunc.c:631
#19 0x00002b3898925142 in kspsolve_ (ksp=0x3da6230, b=0x2, x=0x1,
    __ierr=0x3d424e0)
    at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
#20 0x0000000000c1b0ea in mass_momentum::mass_momentum_krylov ()
    at mass_momentum.F90:777
#21 0x0000000000c0d242 in mass_momentum::mass_momentum_simple ()
    at mass_momentum.F90:548
#22 0x0000000000c0841f in mass_momentum::mass_momentum_solve ()
    at mass_momentum.F90:465
#23 0x000000000041b5ec in refresco () at refresco.F90:259
#24 0x000000000041999e in main ()
#25 0x00002b38a067fc05 in __libc_start_main () from /lib64/libc.so.6
#26 0x00000000004198a3 in _start ()
(gdb)



dr. ir. Christiaan Klaij  | Senior Researcher | Research & Development
MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl

MARIN news: http://www.marin.nl/web/News/News-items/Simulator-facility-in-Houston-as-bridge-between-engineering-and-operations.htm

________________________________________
From: Klaij, Christiaan
Sent: Thursday, December 07, 2017 12:02 PM
To: petsc-users
Cc: Fande Kong
Subject: Re: [petsc-users] segfault after recent scientific linux upgrade

Thanks Satish, I will give it shot and let you know.

Chris
________________________________________
From: Satish Balay <balay at mcs.anl.gov>
Sent: Wednesday, December 06, 2017 6:05 PM
To: Klaij, Christiaan
Cc: Fande Kong; petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] segfault after recent scientific linux upgrade

petsc 3.7 - and 3.8 both default to superlu_dist snapshot:

    self.gitcommit         = 'xsdk-0.2.0-rc1'

If using petsc-3.7 - you can use latest maint-3.7 [i.e 3.7.7+]
[3.7.7 is a latest bugfix update to 3.7 - so there should be no reason to stick to 3.7.5]

But if you really want to stick to 3.7.5 you can use:

--download-superlu_dist=1 --download-superlu_dist-commit=xsdk-0.2.0-rc1

Satish

On Wed, 6 Dec 2017, Klaij, Christiaan wrote:

> Fande,
>
> Thanks, that's good to know. Upgrading to 3.8.x is definitely my
> long-term plan, but is there anything I can do short-term to fix
> the problem while keeping 3.7.5?
>
> Chris
>
> dr. ir. Christiaan Klaij | Senior Researcher | Research & Development
> MARIN | T +31 317 49 33 44 | C.Klaij at marin.nl<mailto:C.Klaij at marin.nl> | www.marin.nl<http://www.marin.nl>
>
> [LinkedIn]<https://www.linkedin.com/company/marin> [YouTube] <http://www.youtube.com/marinmultimedia>  [Twitter] <https://twitter.com/MARIN_nieuws>  [Facebook] <https://www.facebook.com/marin.wageningen>
> MARIN news: Seminar ‘Blauwe toekomst: versnellen van innovaties door samenwerken<http://www.marin.nl/web/News/News-items/Seminar-Blauwe-toekomst-versnellen-van-innovaties-door-samenwerken.htm>
>
> ________________________________
> From: Fande Kong <fdkong.jd at gmail.com>
> Sent: Tuesday, December 05, 2017 4:30 PM
> To: Klaij, Christiaan
> Cc: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] segfault after recent scientific linux upgrade
>
> I would like to suggest you to use PETSc-3.8.x. Then the bug should go away. It is a known bug related to the reuse of the factorization pattern.
>
>
> Fande,
>
> On Tue, Dec 5, 2017 at 8:07 AM, Klaij, Christiaan <C.Klaij at marin.nl<mailto:C.Klaij at marin.nl>> wrote:
> I'm running production software with petsc-3.7.5 and, among
> others, superlu_dist 5.1.3 on scientific linux 7.4.
>
> After a recent update of SL7.4, notably of the kernel and glibc,
> we found that superlu is somehow broken. Below's a backtrace of a
> serial example. Is this a known issue? Could you please advice on
> how to proceed (preferably while keeping 3.7.5 for now).
>
> Thanks,
> Chris
>
> $ gdb ./refresco ./core.9810
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/cklaij/ReFRESCO/Dev/trunk/Suites/testSuite/FlatPlate_laminar/calcs/Grid64x64/refresco...done.
> [New LWP 9810]
> Missing separate debuginfo for /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/licensing-1.55.0/sll/lib64/libssl.so.10
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/68/6a25d0a83d002183c835fa5694a8110c78d3bc.debug
> Missing separate debuginfo for /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/licensing-1.55.0/sll/lib64/libcrypto.so.10
> Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/68/d2958189303f421b1082abc33fd87338826c65.debug
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `./refresco'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00002ba501c132bc in mc64wd_dist (n=0x5213270, ne=0x2, ip=0x1,
>     irn=0x51af520, a=0x51ef260, iperm=0x1000, num=0x7ffc545b2d94,
>     jperm=0x51e7260, out=0x51eb260, pr=0x51ef260, q=0x51f3260, l=0x51f7260,
>     u=0x51fb270, d__=0x5203270)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/mc64ad_dist.c:2322
> 2322    if (iperm[i__] != 0 || iperm[i0] == 0) {
> Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libstdc++-4.8.5-16.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0  0x00002ba501c132bc in mc64wd_dist (n=0x5213270, ne=0x2, ip=0x1,
>     irn=0x51af520, a=0x51ef260, iperm=0x1000, num=0x7ffc545b2d94,
>     jperm=0x51e7260, out=0x51eb260, pr=0x51ef260, q=0x51f3260, l=0x51f7260,
>     u=0x51fb270, d__=0x5203270)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/mc64ad_dist.c:2322
> #1  0x00002ba501c0ef2b in mc64ad_dist (job=0x5213270, n=0x2, ne=0x1,
>     ip=0x51af520, irn=0x51ef260, a=0x1000, num=0x7ffc545b2db0,
>     cperm=0x51fb270, liw=0x5187d10, iw=0x51c3130, ldw=0x51af520, dw=0x517b570,
>     icntl=0x51e7260, info=0x2ba501c2e556 <dldperm_dist+614>)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/mc64ad_dist.c:596
> #2  0x00002ba501c2e556 in dldperm_dist (job=0, n=0, nnz=0, colptr=0x51af520,
>     adjncy=0x51ef260, nzval=0x1000, perm=0x4f00, u=0x1000, v=0x517b001)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/dldperm_dist.c:141
> #3  0x00002ba501c26296 in pdgssvx_ABglobal (options=0x5213270, A=0x2,
>     ScalePermstruct=0x1, B=0x51af520, ldb=85914208, nrhs=4096, grid=0x516da30,
>     LUstruct=0x517af40, berr=0x1000,
>     stat=0x2ba500b36a7d <MatLUFactorNumeric_SuperLU_DIST+2349>, info=0x517af58)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/pdgssvx_ABglobal.c:716
> #4  0x00002ba500b36a7d in MatLUFactorNumeric_SuperLU_DIST (F=0x5213270, A=0x2,
> ---Type <return> to continue, or q <return> to quit---
>     info=0x1)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:419
> #5  0x00002ba500b45a1a in MatLUFactorNumeric (fact=0x5213270, mat=0x2,
>     info=0x1)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/interface/matrix.c:2996
> #6  0x00002ba500e9e6c7 in PCSetUp_LU (pc=0x5213270)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/factor/lu/lu.c:172
> #7  0x00002ba500ded084 in PCSetUp (pc=0x5213270)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:968
> #8  0x00002ba500f2968d in KSPSetUp (ksp=0x5213270)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:390
> #9  0x00002ba500f257be in KSPSolve (ksp=0x5213270, b=0x2, x=0x4193510)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:599
> #10 0x00002ba500f3e142 in kspsolve_ (ksp=0x5213270, b=0x2, x=0x1,
>     __ierr=0x51af520)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
> ---Type <return> to continue, or q <return> to quit---
> #11 0x0000000000bccf71 in petsc_solvers::petsc_solvers_solve (
>     regname='massTransport', rhs_c=..., phi_c=..., tol=0.01, maxiter=500,
>     res0=-9.2559631349317831e+61, usediter=0, .tmp.REGNAME.len_V$1790=13)
>     at petsc_solvers.F90:580
> #12 0x0000000000c2c9c5 in mass_momentum::mass_momentum_pressureprediction ()
>     at mass_momentum.F90:989
> #13 0x0000000000c0ffc1 in mass_momentum::mass_momentum_core ()
>     at mass_momentum.F90:626
> #14 0x0000000000c26a2c in mass_momentum::mass_momentum_systempcapply (
>     aa_system=76390912, xx_system=68983024, rr_system=68984544, ierr=0)
>     at mass_momentum.F90:919
> #15 0x00002ba500eaa763 in ourshellapply (pc=0x48da200, x=0x41c98f0,
>     y=0x41c9ee0)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/ftn-custom/zshellpcf.c:41
> #16 0x00002ba500ea79be in PCApply_Shell (pc=0x5213270, x=0x2, y=0x1)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/shellpc.c:124
> #17 0x00002ba500df1800 in PCApply (pc=0x5213270, x=0x2, y=0x1)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:482
> #18 0x00002ba500f2592a in KSPSolve (ksp=0x5213270, b=0x2, x=0x41c9ee0)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interf---Type <return> to continue, or q <return> to quit---
> ace/itfunc.c:631
> #19 0x00002ba500f3e142 in kspsolve_ (ksp=0x5213270, b=0x2, x=0x1,
>     __ierr=0x51af520)
>     at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
> #20 0x0000000000c1b0ea in mass_momentum::mass_momentum_krylov ()
>     at mass_momentum.F90:777
> #21 0x0000000000c0d242 in mass_momentum::mass_momentum_simple ()
>     at mass_momentum.F90:548
> #22 0x0000000000c0841f in mass_momentum::mass_momentum_solve ()
>     at mass_momentum.F90:465
> #23 0x000000000041b5ec in refresco () at refresco.F90:259
> #24 0x000000000041999e in main ()
> #25 0x00002ba508c98c05 in __libc_start_main () from /lib64/libc.so.6
> #26 0x00000000004198a3 in _start ()
> (gdb)
>
>
> dr. ir. Christiaan Klaij  | Senior Researcher | Research & Development
> MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl<mailto:C.Klaij at marin.nl> | http://www.marin.nl
>
> MARIN news: http://www.marin.nl/web/News/News-items/Seminar-Blauwe-toekomst-versnellen-van-innovaties-door-samenwerken.htm
>
>
>
>
>


More information about the petsc-users mailing list