[petsc-users] crash with PCASM in parallel

Daniel Stone daniel.stone at opengosim.com
Sat Nov 25 11:09:59 CST 2017


Thanks for the quick response.

I tried Valgrind. Apart from a couple of other warnings in other parts of
my code, now fixed, it shows the same stack I described:
==22498== Invalid read of size 4
==22498==    at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices
(baijov.c:609)
==22498==    by 0x538A206: MatDestroy (matrix.c:1168)
==22498==    by 0x5F21F2F: PCSetUp_ILU (ilu.c:162)
==22498==    by 0x604898A: PCSetUp (precon.c:924)
==22498==    by 0x6189005: KSPSetUp (itfunc.c:379)
==22498==    by 0x618AB57: KSPSolve (itfunc.c:599)
==22498==    by 0x5FD4816: PCApply_ASM (asm.c:485)
==22498==    by 0x604204C: PCApply (precon.c:458)
==22498==    by 0x6055C76: pcapply_ (preconf.c:223)
==22498==    by 0x42F500: __cpr_linsolver_MOD_cprapply
(cpr_linsolver.F90:419)
==22498==    by 0x5F42431: ourshellapply (zshellpcf.c:41)
==22498==    by 0x5F3697A: PCApply_Shell (shellpc.c:115)
==22498==    by 0x604204C: PCApply (precon.c:458)
==22498==    by 0x61B74E7: KSP_PCApply (kspimpl.h:251)
==22498==    by 0x61B83C3: KSPInitialResidual (itres.c:67)
==22498==    by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44)
==22498==    by 0x618B77E: KSPSolve (itfunc.c:656)
==22498==    by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224)
==22498==    by 0x6245706: SNESSolve (snes.c:3967)
==22498==    by 0x6265A58: snessolve_ (zsnesf.c:167)
==22498==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22498==

PETSc version: this is from include/petscversion.h:
#define PETSC_VERSION_RELEASE    0
#define PETSC_VERSION_MAJOR      3
#define PETSC_VERSION_MINOR      7
#define PETSC_VERSION_SUBMINOR   5
#define PETSC_VERSION_PATCH      0
#define PETSC_RELEASE_DATE       "Apr, 25, 2016"
#define PETSC_VERSION_DATE       "unknown"

This is the recommended version of PETSc for using with PFLOTRAN:
http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install

Exact debugger output:
It's a graphical debugger so there isn't much to copy/paste.
The exact message is:

Memory error detected in MatDestroy​_MPIBAIJ​_MatGetSubmatrices
​(baijov.c:609)​:

null pointer dereference or unaligned memory access.

I can provide screenshots if that would help.

-ksp_view_pre:
I tried this, it doesn't seem to give information about the KSPs in
question. To be clear, this is
part of an attempt to implement the two stage CPR-AMG preconditioner in
PFLOTRAN, so the
KSP and PC objects involved are:

KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS),
which has a PC:
   PC: shell, the CPR implementation, which calls two more preconditioners,
T1 and T2, in sequence:
      T1: another shell, which calls a KSP (GMRES), which has a PC which is
HYPRE BOOMERAMG
      T2: ASM, this is the problematic one.

-ksp_view_pre doesn't seem to give us any information about the ASM
preconditioner object
or it's ILU sub-KSPs; presumably it crashes before getting there. We do get
a lot of output about
T1, for example:

KSP Object: T1 24 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using DEFAULT norm type for convergence test
PC Object: 24 MPI processes
  type: hypre
  PC has not been set up so information may be incomplete
    HYPRE BoomerAMG preconditioning
    HYPRE BoomerAMG: Cycle type V
    HYPRE BoomerAMG: Maximum number of levels 25
    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
    HYPRE BoomerAMG: Threshold for strong coupling 0.25
    HYPRE BoomerAMG: Interpolation truncation factor 0.
    HYPRE BoomerAMG: Interpolation: max elements per row 0
    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
    HYPRE BoomerAMG: Maximum row sums 0.9
    HYPRE BoomerAMG: Sweeps down         1
    HYPRE BoomerAMG: Sweeps up           1
    HYPRE BoomerAMG: Sweeps on coarse    1
    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
    HYPRE BoomerAMG: Relax weight  (all)      1.
    HYPRE BoomerAMG: Outer relax weight (all) 1.
    HYPRE BoomerAMG: Using CF-relaxation
    HYPRE BoomerAMG: Not using more complex smoothers.
    HYPRE BoomerAMG: Measure type        local
    HYPRE BoomerAMG: Coarsen type        Falgout
    HYPRE BoomerAMG: Interpolation type  classical
  linear system matrix = precond matrix:
  Mat Object: 24 MPI processes
    type: mpiaij
    rows=1122000, cols=1122000
    total: nonzeros=7780000, allocated nonzeros=7780000
    total number of mallocs used during MatSetValues calls =0

Thanks,

Daniel Stone


On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   First run under valgrind.   https://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
>
>    If that doesn't help send the exact output from the debugger (cut and
> paste) and the exact version of PETSc you are using.
> Also out put from -ksp_view_pre
>
>   Barry
>
> > On Nov 24, 2017, at 8:03 AM, Daniel Stone <daniel.stone at opengosim.com>
> wrote:
> >
> > Hello,
> >
> > I'm getting a memory exception crash every time I try to run the ASM
> preconditioner in parallel, can anyone help?
> >
> > I'm using a debugger so I can give most of the stack:
> >
> > PCApply_ASM (asm.c:line 485)
> >   KSPSolve (itfunc.c:line 599)
> >     KSPSetUp (itfunc.c:line 379)
> >        PCSetUp (precon.c: 924)
> >           PCSetUp_ILU (ilu.c:line 162)
> >             MatDestroy  (matrix.c:line 1168)
> >               MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609)
> >
> >
> > The problem line is then in  MatDestroy_MPIBAIJ_MatGetSubMatrices,
> > in the file baijov.c, line 609:
> >
> > if (!submatj->id) {
> >
> > At this point submatj has no value, address 0x0, and so the attempt to
> access submatj->id
> > causes the memory error. We can see in the lines just above 609 where
> submatj is supposed to
> > come from, it should basically be an attribute of C->data, where C is
> the input matrix.
> >
> > Does anyone have any ideas where to start with getting this to work? I
> can provide a lot more information
> > from the debugger if need.
> >
> > Many thanks in advance,
> >
> > Daniel Stone
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171125/08a12b32/attachment.html>


More information about the petsc-users mailing list