[petsc-users] segfault after recent scientific linux upgrade
Klaij, Christiaan
C.Klaij at marin.nl
Fri Dec 8 01:55:40 CST 2017
Almost valgrind clean. We use intelmpi so we need a handfull of suppressions.
Chris
dr. ir. Christiaan Klaij | Senior Researcher | Research & Development
MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl
MARIN news: http://www.marin.nl/web/News/News-items/GROW-partners-innovate-together-in-offshore-wind-industry.htm
________________________________________
From: Satish Balay <balay at mcs.anl.gov>
Sent: Thursday, December 07, 2017 6:07 PM
To: Klaij, Christiaan
Cc: petsc-users
Subject: Re: [petsc-users] segfault after recent scientific linux upgrade
Could you check if your code is valgrind clean?
Satish
On Thu, 7 Dec 2017, Klaij, Christiaan wrote:
> Satish,
>
> As a first try, I've kept petsc-3.7.5 and only replaced superlu
> by the new xsdk-0.2.0-rc1 version. Unfortunately, this doesn't
> fix the problem, see the backtrace below.
>
> Fande,
>
> Perhaps the problem is related to petsc, not superlu?
>
> What really puzzles me is that everything was working fine with
> petsc-3.7.5 and superlu_dist_5.3.1, it only broke after we
> updated Scientific Linux 7. So this bug (in petsc or in superlu)
> was already there but somehow not triggered before the SL7
> update?
>
> Chris
>
> (gdb) bt
> #0 0x00002b38995fa30c in mc64wd_dist (n=0x3da6230, ne=0x2, ip=0x1,
> irn=0x3d424e0, a=0x3d82220, iperm=0x1000, num=0x7ffc505dd294,
> jperm=0x3d7a220, out=0x3d7e220, pr=0x3d82220, q=0x3d86220, l=0x3d8a220,
> u=0x3d8e230, d__=0x3d96230)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/mc64ad_dist.c:2322
> #1 0x00002b38995f5f7b in mc64ad_dist (job=0x3da6230, n=0x2, ne=0x1,
> ip=0x3d424e0, irn=0x3d82220, a=0x1000, num=0x7ffc505dd2b0,
> cperm=0x3d8e230, liw=0x3d1acd0, iw=0x3d560f0, ldw=0x3d424e0, dw=0x3d0e530,
> icntl=0x3d7a220, info=0x2b3899615546 <dldperm_dist+614>)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/mc64ad_dist.c:596
> #2 0x00002b3899615546 in dldperm_dist (job=0, n=0, nnz=0, colptr=0x3d424e0,
> adjncy=0x3d82220, nzval=0x1000, perm=0x4f00, u=0x1000, v=0x3d0e001)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/dldperm_dist.c:141
> #3 0x00002b389960d286 in pdgssvx_ABglobal (options=0x3da6230, A=0x2,
> ScalePermstruct=0x1, B=0x3d424e0, ldb=64496160, nrhs=4096, grid=0x3d009f0,
> LUstruct=0x3d0df00, berr=0x1000,
> stat=0x2b389851da7d <MatLUFactorNumeric_SuperLU_DIST+2349>, info=0x3d0df18)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/superlu_dist-xsdk-0.2.0-rc1/SRC/pdgssvx_ABglobal.c:716
> #4 0x00002b389851da7d in MatLUFactorNumeric_SuperLU_DIST (F=0x3da6230, A=0x2,
> ---Type <return> to continue, or q <return> to quit---
> info=0x1)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:419
> #5 0x00002b389852ca1a in MatLUFactorNumeric (fact=0x3da6230, mat=0x2,
> info=0x1)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/interface/matrix.c:2996
> #6 0x00002b38988856c7 in PCSetUp_LU (pc=0x3da6230)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/factor/lu/lu.c:172
> #7 0x00002b38987d4084 in PCSetUp (pc=0x3da6230)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:968
> #8 0x00002b389891068d in KSPSetUp (ksp=0x3da6230)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:390
> #9 0x00002b389890c7be in KSPSolve (ksp=0x3da6230, b=0x2, x=0x2d18d90)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:599
> #10 0x00002b3898925142 in kspsolve_ (ksp=0x3da6230, b=0x2, x=0x1,
> __ierr=0x3d424e0)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
> ---Type <return> to continue, or q <return> to quit---
> #11 0x0000000000bccf71 in petsc_solvers::petsc_solvers_solve (
> regname='massTransport', rhs_c=..., phi_c=..., tol=0.01, maxiter=500,
> res0=-9.2559631349317831e+61, usediter=0, .tmp.REGNAME.len_V$1790=13)
> at petsc_solvers.F90:580
> #12 0x0000000000c2c9c5 in mass_momentum::mass_momentum_pressureprediction ()
> at mass_momentum.F90:989
> #13 0x0000000000c0ffc1 in mass_momentum::mass_momentum_core ()
> at mass_momentum.F90:626
> #14 0x0000000000c26a2c in mass_momentum::mass_momentum_systempcapply (
> aa_system=54952496, xx_system=47570896, rr_system=47572416, ierr=0)
> at mass_momentum.F90:919
> #15 0x00002b3898891763 in ourshellapply (pc=0x3468230, x=0x2d5dfd0,
> y=0x2d5e5c0)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/ftn-custom/zshellpcf.c:41
> #16 0x00002b389888e9be in PCApply_Shell (pc=0x3da6230, x=0x2, y=0x1)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/shellpc.c:124
> #17 0x00002b38987d8800 in PCApply (pc=0x3da6230, x=0x2, y=0x1)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:482
> #18 0x00002b389890c92a in KSPSolve (ksp=0x3da6230, b=0x2, x=0x2d5e5c0)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interf---Type <return> to continue, or q <return> to quit---
> ace/itfunc.c:631
> #19 0x00002b3898925142 in kspsolve_ (ksp=0x3da6230, b=0x2, x=0x1,
> __ierr=0x3d424e0)
> at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
> #20 0x0000000000c1b0ea in mass_momentum::mass_momentum_krylov ()
> at mass_momentum.F90:777
> #21 0x0000000000c0d242 in mass_momentum::mass_momentum_simple ()
> at mass_momentum.F90:548
> #22 0x0000000000c0841f in mass_momentum::mass_momentum_solve ()
> at mass_momentum.F90:465
> #23 0x000000000041b5ec in refresco () at refresco.F90:259
> #24 0x000000000041999e in main ()
> #25 0x00002b38a067fc05 in __libc_start_main () from /lib64/libc.so.6
> #26 0x00000000004198a3 in _start ()
> (gdb)
>
>
>
> dr. ir. Christiaan Klaij | Senior Researcher | Research & Development
> MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl | http://www.marin.nl
>
> MARIN news: http://www.marin.nl/web/News/News-items/Simulator-facility-in-Houston-as-bridge-between-engineering-and-operations.htm
>
> ________________________________________
> From: Klaij, Christiaan
> Sent: Thursday, December 07, 2017 12:02 PM
> To: petsc-users
> Cc: Fande Kong
> Subject: Re: [petsc-users] segfault after recent scientific linux upgrade
>
> Thanks Satish, I will give it shot and let you know.
>
> Chris
> ________________________________________
> From: Satish Balay <balay at mcs.anl.gov>
> Sent: Wednesday, December 06, 2017 6:05 PM
> To: Klaij, Christiaan
> Cc: Fande Kong; petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] segfault after recent scientific linux upgrade
>
> petsc 3.7 - and 3.8 both default to superlu_dist snapshot:
>
> self.gitcommit = 'xsdk-0.2.0-rc1'
>
> If using petsc-3.7 - you can use latest maint-3.7 [i.e 3.7.7+]
> [3.7.7 is a latest bugfix update to 3.7 - so there should be no reason to stick to 3.7.5]
>
> But if you really want to stick to 3.7.5 you can use:
>
> --download-superlu_dist=1 --download-superlu_dist-commit=xsdk-0.2.0-rc1
>
> Satish
>
> On Wed, 6 Dec 2017, Klaij, Christiaan wrote:
>
> > Fande,
> >
> > Thanks, that's good to know. Upgrading to 3.8.x is definitely my
> > long-term plan, but is there anything I can do short-term to fix
> > the problem while keeping 3.7.5?
> >
> > Chris
> >
> > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development
> > MARIN | T +31 317 49 33 44 | C.Klaij at marin.nl<mailto:C.Klaij at marin.nl> | www.marin.nl<http://www.marin.nl>
> >
> > [LinkedIn]<https://www.linkedin.com/company/marin> [YouTube] <http://www.youtube.com/marinmultimedia> [Twitter] <https://twitter.com/MARIN_nieuws> [Facebook] <https://www.facebook.com/marin.wageningen>
> > MARIN news: Seminar ‘Blauwe toekomst: versnellen van innovaties door samenwerken<http://www.marin.nl/web/News/News-items/Seminar-Blauwe-toekomst-versnellen-van-innovaties-door-samenwerken.htm>
> >
> > ________________________________
> > From: Fande Kong <fdkong.jd at gmail.com>
> > Sent: Tuesday, December 05, 2017 4:30 PM
> > To: Klaij, Christiaan
> > Cc: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] segfault after recent scientific linux upgrade
> >
> > I would like to suggest you to use PETSc-3.8.x. Then the bug should go away. It is a known bug related to the reuse of the factorization pattern.
> >
> >
> > Fande,
> >
> > On Tue, Dec 5, 2017 at 8:07 AM, Klaij, Christiaan <C.Klaij at marin.nl<mailto:C.Klaij at marin.nl>> wrote:
> > I'm running production software with petsc-3.7.5 and, among
> > others, superlu_dist 5.1.3 on scientific linux 7.4.
> >
> > After a recent update of SL7.4, notably of the kernel and glibc,
> > we found that superlu is somehow broken. Below's a backtrace of a
> > serial example. Is this a known issue? Could you please advice on
> > how to proceed (preferably while keeping 3.7.5 for now).
> >
> > Thanks,
> > Chris
> >
> > $ gdb ./refresco ./core.9810
> > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-redhat-linux-gnu".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /home/cklaij/ReFRESCO/Dev/trunk/Suites/testSuite/FlatPlate_laminar/calcs/Grid64x64/refresco...done.
> > [New LWP 9810]
> > Missing separate debuginfo for /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/licensing-1.55.0/sll/lib64/libssl.so.10
> > Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/68/6a25d0a83d002183c835fa5694a8110c78d3bc.debug
> > Missing separate debuginfo for /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/licensing-1.55.0/sll/lib64/libcrypto.so.10
> > Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/68/d2958189303f421b1082abc33fd87338826c65.debug
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `./refresco'.
> > Program terminated with signal 11, Segmentation fault.
> > #0 0x00002ba501c132bc in mc64wd_dist (n=0x5213270, ne=0x2, ip=0x1,
> > irn=0x51af520, a=0x51ef260, iperm=0x1000, num=0x7ffc545b2d94,
> > jperm=0x51e7260, out=0x51eb260, pr=0x51ef260, q=0x51f3260, l=0x51f7260,
> > u=0x51fb270, d__=0x5203270)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/mc64ad_dist.c:2322
> > 2322 if (iperm[i__] != 0 || iperm[i0] == 0) {
> > Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libselinux-2.5-11.el7.x86_64 libstdc++-4.8.5-16.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> > (gdb) bt
> > #0 0x00002ba501c132bc in mc64wd_dist (n=0x5213270, ne=0x2, ip=0x1,
> > irn=0x51af520, a=0x51ef260, iperm=0x1000, num=0x7ffc545b2d94,
> > jperm=0x51e7260, out=0x51eb260, pr=0x51ef260, q=0x51f3260, l=0x51f7260,
> > u=0x51fb270, d__=0x5203270)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/mc64ad_dist.c:2322
> > #1 0x00002ba501c0ef2b in mc64ad_dist (job=0x5213270, n=0x2, ne=0x1,
> > ip=0x51af520, irn=0x51ef260, a=0x1000, num=0x7ffc545b2db0,
> > cperm=0x51fb270, liw=0x5187d10, iw=0x51c3130, ldw=0x51af520, dw=0x517b570,
> > icntl=0x51e7260, info=0x2ba501c2e556 <dldperm_dist+614>)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/mc64ad_dist.c:596
> > #2 0x00002ba501c2e556 in dldperm_dist (job=0, n=0, nnz=0, colptr=0x51af520,
> > adjncy=0x51ef260, nzval=0x1000, perm=0x4f00, u=0x1000, v=0x517b001)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/dldperm_dist.c:141
> > #3 0x00002ba501c26296 in pdgssvx_ABglobal (options=0x5213270, A=0x2,
> > ScalePermstruct=0x1, B=0x51af520, ldb=85914208, nrhs=4096, grid=0x516da30,
> > LUstruct=0x517af40, berr=0x1000,
> > stat=0x2ba500b36a7d <MatLUFactorNumeric_SuperLU_DIST+2349>, info=0x517af58)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/install/Linux-x86_64-Intel/SuperLU_DIST_5.1.3/SRC/pdgssvx_ABglobal.c:716
> > #4 0x00002ba500b36a7d in MatLUFactorNumeric_SuperLU_DIST (F=0x5213270, A=0x2,
> > ---Type <return> to continue, or q <return> to quit---
> > info=0x1)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:419
> > #5 0x00002ba500b45a1a in MatLUFactorNumeric (fact=0x5213270, mat=0x2,
> > info=0x1)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/mat/interface/matrix.c:2996
> > #6 0x00002ba500e9e6c7 in PCSetUp_LU (pc=0x5213270)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/factor/lu/lu.c:172
> > #7 0x00002ba500ded084 in PCSetUp (pc=0x5213270)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:968
> > #8 0x00002ba500f2968d in KSPSetUp (ksp=0x5213270)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:390
> > #9 0x00002ba500f257be in KSPSolve (ksp=0x5213270, b=0x2, x=0x4193510)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c:599
> > #10 0x00002ba500f3e142 in kspsolve_ (ksp=0x5213270, b=0x2, x=0x1,
> > __ierr=0x51af520)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
> > ---Type <return> to continue, or q <return> to quit---
> > #11 0x0000000000bccf71 in petsc_solvers::petsc_solvers_solve (
> > regname='massTransport', rhs_c=..., phi_c=..., tol=0.01, maxiter=500,
> > res0=-9.2559631349317831e+61, usediter=0, .tmp.REGNAME.len_V$1790=13)
> > at petsc_solvers.F90:580
> > #12 0x0000000000c2c9c5 in mass_momentum::mass_momentum_pressureprediction ()
> > at mass_momentum.F90:989
> > #13 0x0000000000c0ffc1 in mass_momentum::mass_momentum_core ()
> > at mass_momentum.F90:626
> > #14 0x0000000000c26a2c in mass_momentum::mass_momentum_systempcapply (
> > aa_system=76390912, xx_system=68983024, rr_system=68984544, ierr=0)
> > at mass_momentum.F90:919
> > #15 0x00002ba500eaa763 in ourshellapply (pc=0x48da200, x=0x41c98f0,
> > y=0x41c9ee0)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/ftn-custom/zshellpcf.c:41
> > #16 0x00002ba500ea79be in PCApply_Shell (pc=0x5213270, x=0x2, y=0x1)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/impls/shell/shellpc.c:124
> > #17 0x00002ba500df1800 in PCApply (pc=0x5213270, x=0x2, y=0x1)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/pc/interface/precon.c:482
> > #18 0x00002ba500f2592a in KSPSolve (ksp=0x5213270, b=0x2, x=0x41c9ee0)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interf---Type <return> to continue, or q <return> to quit---
> > ace/itfunc.c:631
> > #19 0x00002ba500f3e142 in kspsolve_ (ksp=0x5213270, b=0x2, x=0x1,
> > __ierr=0x51af520)
> > at /home/cklaij/ReFRESCO/Dev/trunk/Libs/build/petsc-3.7.5/src/ksp/ksp/interface/ftn-auto/itfuncf.c:261
> > #20 0x0000000000c1b0ea in mass_momentum::mass_momentum_krylov ()
> > at mass_momentum.F90:777
> > #21 0x0000000000c0d242 in mass_momentum::mass_momentum_simple ()
> > at mass_momentum.F90:548
> > #22 0x0000000000c0841f in mass_momentum::mass_momentum_solve ()
> > at mass_momentum.F90:465
> > #23 0x000000000041b5ec in refresco () at refresco.F90:259
> > #24 0x000000000041999e in main ()
> > #25 0x00002ba508c98c05 in __libc_start_main () from /lib64/libc.so.6
> > #26 0x00000000004198a3 in _start ()
> > (gdb)
> >
> >
> > dr. ir. Christiaan Klaij | Senior Researcher | Research & Development
> > MARIN | T +31 317 49 33 44 | mailto:C.Klaij at marin.nl<mailto:C.Klaij at marin.nl> | http://www.marin.nl
> >
> > MARIN news: http://www.marin.nl/web/News/News-items/Seminar-Blauwe-toekomst-versnellen-van-innovaties-door-samenwerken.htm
> >
> >
> >
> >
> >
>
More information about the petsc-users
mailing list