[petsc-users] crash with PCASM in parallel
Daniel Stone
daniel.stone at opengosim.com
Sat Nov 25 11:09:59 CST 2017
Thanks for the quick response.
I tried Valgrind. Apart from a couple of other warnings in other parts of
my code, now fixed, it shows the same stack I described:
==22498== Invalid read of size 4
==22498== at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices
(baijov.c:609)
==22498== by 0x538A206: MatDestroy (matrix.c:1168)
==22498== by 0x5F21F2F: PCSetUp_ILU (ilu.c:162)
==22498== by 0x604898A: PCSetUp (precon.c:924)
==22498== by 0x6189005: KSPSetUp (itfunc.c:379)
==22498== by 0x618AB57: KSPSolve (itfunc.c:599)
==22498== by 0x5FD4816: PCApply_ASM (asm.c:485)
==22498== by 0x604204C: PCApply (precon.c:458)
==22498== by 0x6055C76: pcapply_ (preconf.c:223)
==22498== by 0x42F500: __cpr_linsolver_MOD_cprapply
(cpr_linsolver.F90:419)
==22498== by 0x5F42431: ourshellapply (zshellpcf.c:41)
==22498== by 0x5F3697A: PCApply_Shell (shellpc.c:115)
==22498== by 0x604204C: PCApply (precon.c:458)
==22498== by 0x61B74E7: KSP_PCApply (kspimpl.h:251)
==22498== by 0x61B83C3: KSPInitialResidual (itres.c:67)
==22498== by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44)
==22498== by 0x618B77E: KSPSolve (itfunc.c:656)
==22498== by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224)
==22498== by 0x6245706: SNESSolve (snes.c:3967)
==22498== by 0x6265A58: snessolve_ (zsnesf.c:167)
==22498== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22498==
PETSc version: this is from include/petscversion.h:
#define PETSC_VERSION_RELEASE 0
#define PETSC_VERSION_MAJOR 3
#define PETSC_VERSION_MINOR 7
#define PETSC_VERSION_SUBMINOR 5
#define PETSC_VERSION_PATCH 0
#define PETSC_RELEASE_DATE "Apr, 25, 2016"
#define PETSC_VERSION_DATE "unknown"
This is the recommended version of PETSc for using with PFLOTRAN:
http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install
Exact debugger output:
It's a graphical debugger so there isn't much to copy/paste.
The exact message is:
Memory error detected in MatDestroy_MPIBAIJ_MatGetSubmatrices
(baijov.c:609):
null pointer dereference or unaligned memory access.
I can provide screenshots if that would help.
-ksp_view_pre:
I tried this, it doesn't seem to give information about the KSPs in
question. To be clear, this is
part of an attempt to implement the two stage CPR-AMG preconditioner in
PFLOTRAN, so the
KSP and PC objects involved are:
KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS),
which has a PC:
PC: shell, the CPR implementation, which calls two more preconditioners,
T1 and T2, in sequence:
T1: another shell, which calls a KSP (GMRES), which has a PC which is
HYPRE BOOMERAMG
T2: ASM, this is the problematic one.
-ksp_view_pre doesn't seem to give us any information about the ASM
preconditioner object
or it's ILU sub-KSPs; presumably it crashes before getting there. We do get
a lot of output about
T1, for example:
KSP Object: T1 24 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using DEFAULT norm type for convergence test
PC Object: 24 MPI processes
type: hypre
PC has not been set up so information may be incomplete
HYPRE BoomerAMG preconditioning
HYPRE BoomerAMG: Cycle type V
HYPRE BoomerAMG: Maximum number of levels 25
HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
HYPRE BoomerAMG: Threshold for strong coupling 0.25
HYPRE BoomerAMG: Interpolation truncation factor 0.
HYPRE BoomerAMG: Interpolation: max elements per row 0
HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
HYPRE BoomerAMG: Maximum row sums 0.9
HYPRE BoomerAMG: Sweeps down 1
HYPRE BoomerAMG: Sweeps up 1
HYPRE BoomerAMG: Sweeps on coarse 1
HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
HYPRE BoomerAMG: Relax weight (all) 1.
HYPRE BoomerAMG: Outer relax weight (all) 1.
HYPRE BoomerAMG: Using CF-relaxation
HYPRE BoomerAMG: Not using more complex smoothers.
HYPRE BoomerAMG: Measure type local
HYPRE BoomerAMG: Coarsen type Falgout
HYPRE BoomerAMG: Interpolation type classical
linear system matrix = precond matrix:
Mat Object: 24 MPI processes
type: mpiaij
rows=1122000, cols=1122000
total: nonzeros=7780000, allocated nonzeros=7780000
total number of mallocs used during MatSetValues calls =0
Thanks,
Daniel Stone
On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
> First run under valgrind. https://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
>
> If that doesn't help send the exact output from the debugger (cut and
> paste) and the exact version of PETSc you are using.
> Also out put from -ksp_view_pre
>
> Barry
>
> > On Nov 24, 2017, at 8:03 AM, Daniel Stone <daniel.stone at opengosim.com>
> wrote:
> >
> > Hello,
> >
> > I'm getting a memory exception crash every time I try to run the ASM
> preconditioner in parallel, can anyone help?
> >
> > I'm using a debugger so I can give most of the stack:
> >
> > PCApply_ASM (asm.c:line 485)
> > KSPSolve (itfunc.c:line 599)
> > KSPSetUp (itfunc.c:line 379)
> > PCSetUp (precon.c: 924)
> > PCSetUp_ILU (ilu.c:line 162)
> > MatDestroy (matrix.c:line 1168)
> > MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609)
> >
> >
> > The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices,
> > in the file baijov.c, line 609:
> >
> > if (!submatj->id) {
> >
> > At this point submatj has no value, address 0x0, and so the attempt to
> access submatj->id
> > causes the memory error. We can see in the lines just above 609 where
> submatj is supposed to
> > come from, it should basically be an attribute of C->data, where C is
> the input matrix.
> >
> > Does anyone have any ideas where to start with getting this to work? I
> can provide a lot more information
> > from the debugger if need.
> >
> > Many thanks in advance,
> >
> > Daniel Stone
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171125/08a12b32/attachment.html>
More information about the petsc-users
mailing list