[petsc-users] crash with PCASM in parallel

Daniel Stone daniel.stone at opengosim.com
Sat Nov 25 16:44:43 CST 2017


The PFLOTRAN wiki seems to want us to install a very old version of PETSc:
http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install
The relevant part is:
"
Install PETSc

3.1. Clone petsc and check out the supported version:

git clone https://bitbucket.org/petsc/petsc petsc
cd petsc
git checkout xsdk-0.2.0
NOTE:PFLOTRAN currently uses a snapshot of PETSc ‘maint’ (release) branch.
The only supported snapshot/version is specified by the changeset-id above.
The supported version will change periodically as we need bug fixes or new
features and changes will be announced on the mailing lists. The supported
version of petsc is used on the buildbot automated testing system.
"
Doing git checkout xsdk-0.2.0 causes petsc/src/mat/impls/baij/mpi/baijov.c
to be replaced with a version that contains the subroutine.

I'm fairly certain this version is ancient. I've had trouble before with
just installing the most recent version of PETSc and then trying to run
PFLOTRAN on top of it. I've asked my employer where we stand with supported
PETSc versions.
Also plan to try compiling PFLOTRAN on top of a more recent PETSc, to see
if it works this time.

On Sat, Nov 25, 2017 at 10:03 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
wrote:

>
>   I cannot find that routine
>
> ~/Src/petsc ((v3.6.4)) arch-basic
> $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
> ~/Src/petsc ((v3.6.4)) arch-basic
> $ git checkout v3.7.5
> Previous HEAD position was 401b1b531b... Increase patchlevel to 3.6.4
> HEAD is now at b827f1350a... Increase patchlevel to 3.7.5
> ~/Src/petsc ((v3.7.5)) arch-basic
> $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
> ~/Src/petsc ((v3.7.5)) arch-basic
> $ git checkout v3.7.0
> error: pathspec 'v3.7.0' did not match any file(s) known to git.
> ~/Src/petsc ((v3.7.5)) arch-basic
> $ git checkout v3.7
> Previous HEAD position was b827f1350a... Increase patchlevel to 3.7.5
> HEAD is now at ae618e6989... release: set v3.7 strings
> ~/Src/petsc ((v3.7)) arch-basic
> $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
> ~/Src/petsc ((v3.7)) arch-basic
> $ git checkout v3.8
> Previous HEAD position was ae618e6989... release: set v3.7 strings
> HEAD is now at 0e50f9e530... release: set v3.8 strings
> ~/Src/petsc ((v3.8)) arch-basic
> $ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
> ~/Src/petsc ((v3.8)) arch-basic
>
>
>
> Are you using a the PETSc git repository and some particular branch or
> commit in it?
>
>
> > On Nov 25, 2017, at 11:09 AM, Daniel Stone <daniel.stone at opengosim.com>
> wrote:
> >
> > Thanks for the quick response.
> >
> > I tried Valgrind. Apart from a couple of other warnings in other parts
> of my code, now fixed, it shows the same stack I described:
> > ==22498== Invalid read of size 4
> > ==22498==    at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices
> (baijov.c:609)
> > ==22498==    by 0x538A206: MatDestroy (matrix.c:1168)
> > ==22498==    by 0x5F21F2F: PCSetUp_ILU (ilu.c:162)
> > ==22498==    by 0x604898A: PCSetUp (precon.c:924)
> > ==22498==    by 0x6189005: KSPSetUp (itfunc.c:379)
> > ==22498==    by 0x618AB57: KSPSolve (itfunc.c:599)
> > ==22498==    by 0x5FD4816: PCApply_ASM (asm.c:485)
> > ==22498==    by 0x604204C: PCApply (precon.c:458)
> > ==22498==    by 0x6055C76: pcapply_ (preconf.c:223)
> > ==22498==    by 0x42F500: __cpr_linsolver_MOD_cprapply
> (cpr_linsolver.F90:419)
> > ==22498==    by 0x5F42431: ourshellapply (zshellpcf.c:41)
> > ==22498==    by 0x5F3697A: PCApply_Shell (shellpc.c:115)
> > ==22498==    by 0x604204C: PCApply (precon.c:458)
> > ==22498==    by 0x61B74E7: KSP_PCApply (kspimpl.h:251)
> > ==22498==    by 0x61B83C3: KSPInitialResidual (itres.c:67)
> > ==22498==    by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44)
> > ==22498==    by 0x618B77E: KSPSolve (itfunc.c:656)
> > ==22498==    by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224)
> > ==22498==    by 0x6245706: SNESSolve (snes.c:3967)
> > ==22498==    by 0x6265A58: snessolve_ (zsnesf.c:167)
> > ==22498==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > ==22498==
> >
> > PETSc version: this is from include/petscversion.h:
> > #define PETSC_VERSION_RELEASE    0
> > #define PETSC_VERSION_MAJOR      3
> > #define PETSC_VERSION_MINOR      7
> > #define PETSC_VERSION_SUBMINOR   5
> > #define PETSC_VERSION_PATCH      0
> > #define PETSC_RELEASE_DATE       "Apr, 25, 2016"
> > #define PETSC_VERSION_DATE       "unknown"
> >
> > This is the recommended version of PETSc for using with PFLOTRAN:
> > http://documentation.pflotran.org/user_guide/how_to/
> installation/linux.html#linux-install
> >
> > Exact debugger output:
> > It's a graphical debugger so there isn't much to copy/paste.
> > The exact message is:
> >
> > Memory error detected in MatDestroy​_MPIBAIJ​_MatGetSubmatrices
> ​(baijov.c:609)​:
> >
> > null pointer dereference or unaligned memory access.
> >
> > I can provide screenshots if that would help.
> >
> > -ksp_view_pre:
> > I tried this, it doesn't seem to give information about the KSPs in
> question. To be clear, this is
> > part of an attempt to implement the two stage CPR-AMG preconditioner in
> PFLOTRAN, so the
> > KSP and PC objects involved are:
> >
> > KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS),
> > which has a PC:
> >    PC: shell, the CPR implementation, which calls two more
> preconditioners, T1 and T2, in sequence:
> >       T1: another shell, which calls a KSP (GMRES), which has a PC which
> is HYPRE BOOMERAMG
> >       T2: ASM, this is the problematic one.
> >
> > -ksp_view_pre doesn't seem to give us any information about the ASM
> preconditioner object
> > or it's ILU sub-KSPs; presumably it crashes before getting there. We do
> get a lot of output about
> > T1, for example:
> >
> > KSP Object: T1 24 MPI processes
> >   type: gmres
> >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >     GMRES: happy breakdown tolerance 1e-30
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >   left preconditioning
> >   using DEFAULT norm type for convergence test
> > PC Object: 24 MPI processes
> >   type: hypre
> >   PC has not been set up so information may be incomplete
> >     HYPRE BoomerAMG preconditioning
> >     HYPRE BoomerAMG: Cycle type V
> >     HYPRE BoomerAMG: Maximum number of levels 25
> >     HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> >     HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
> >     HYPRE BoomerAMG: Threshold for strong coupling 0.25
> >     HYPRE BoomerAMG: Interpolation truncation factor 0.
> >     HYPRE BoomerAMG: Interpolation: max elements per row 0
> >     HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> >     HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> >     HYPRE BoomerAMG: Maximum row sums 0.9
> >     HYPRE BoomerAMG: Sweeps down         1
> >     HYPRE BoomerAMG: Sweeps up           1
> >     HYPRE BoomerAMG: Sweeps on coarse    1
> >     HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
> >     HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
> >     HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
> >     HYPRE BoomerAMG: Relax weight  (all)      1.
> >     HYPRE BoomerAMG: Outer relax weight (all) 1.
> >     HYPRE BoomerAMG: Using CF-relaxation
> >     HYPRE BoomerAMG: Not using more complex smoothers.
> >     HYPRE BoomerAMG: Measure type        local
> >     HYPRE BoomerAMG: Coarsen type        Falgout
> >     HYPRE BoomerAMG: Interpolation type  classical
> >   linear system matrix = precond matrix:
> >   Mat Object: 24 MPI processes
> >     type: mpiaij
> >     rows=1122000, cols=1122000
> >     total: nonzeros=7780000, allocated nonzeros=7780000
> >     total number of mallocs used during MatSetValues calls =0
> >
> > Thanks,
> >
> > Daniel Stone
> >
> >
> > On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >   First run under valgrind.   https://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> >
> >    If that doesn't help send the exact output from the debugger (cut and
> paste) and the exact version of PETSc you are using.
> > Also out put from -ksp_view_pre
> >
> >   Barry
> >
> > > On Nov 24, 2017, at 8:03 AM, Daniel Stone <daniel.stone at opengosim.com>
> wrote:
> > >
> > > Hello,
> > >
> > > I'm getting a memory exception crash every time I try to run the ASM
> preconditioner in parallel, can anyone help?
> > >
> > > I'm using a debugger so I can give most of the stack:
> > >
> > > PCApply_ASM (asm.c:line 485)
> > >   KSPSolve (itfunc.c:line 599)
> > >     KSPSetUp (itfunc.c:line 379)
> > >        PCSetUp (precon.c: 924)
> > >           PCSetUp_ILU (ilu.c:line 162)
> > >             MatDestroy  (matrix.c:line 1168)
> > >               MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609)
> > >
> > >
> > > The problem line is then in  MatDestroy_MPIBAIJ_MatGetSubMatrices,
> > > in the file baijov.c, line 609:
> > >
> > > if (!submatj->id) {
> > >
> > > At this point submatj has no value, address 0x0, and so the attempt to
> access submatj->id
> > > causes the memory error. We can see in the lines just above 609 where
> submatj is supposed to
> > > come from, it should basically be an attribute of C->data, where C is
> the input matrix.
> > >
> > > Does anyone have any ideas where to start with getting this to work? I
> can provide a lot more information
> > > from the debugger if need.
> > >
> > > Many thanks in advance,
> > >
> > > Daniel Stone
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171125/49489f43/attachment-0001.html>


More information about the petsc-users mailing list