[petsc-users] crash with PCASM in parallel
Smith, Barry F.
bsmith at mcs.anl.gov
Sat Nov 25 16:03:09 CST 2017
I cannot find that routine
~/Src/petsc ((v3.6.4)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.6.4)) arch-basic
$ git checkout v3.7.5
Previous HEAD position was 401b1b531b... Increase patchlevel to 3.6.4
HEAD is now at b827f1350a... Increase patchlevel to 3.7.5
~/Src/petsc ((v3.7.5)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.7.5)) arch-basic
$ git checkout v3.7.0
error: pathspec 'v3.7.0' did not match any file(s) known to git.
~/Src/petsc ((v3.7.5)) arch-basic
$ git checkout v3.7
Previous HEAD position was b827f1350a... Increase patchlevel to 3.7.5
HEAD is now at ae618e6989... release: set v3.7 strings
~/Src/petsc ((v3.7)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.7)) arch-basic
$ git checkout v3.8
Previous HEAD position was ae618e6989... release: set v3.7 strings
HEAD is now at 0e50f9e530... release: set v3.8 strings
~/Src/petsc ((v3.8)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.8)) arch-basic
Are you using a the PETSc git repository and some particular branch or commit in it?
> On Nov 25, 2017, at 11:09 AM, Daniel Stone <daniel.stone at opengosim.com> wrote:
>
> Thanks for the quick response.
>
> I tried Valgrind. Apart from a couple of other warnings in other parts of my code, now fixed, it shows the same stack I described:
> ==22498== Invalid read of size 4
> ==22498== at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices (baijov.c:609)
> ==22498== by 0x538A206: MatDestroy (matrix.c:1168)
> ==22498== by 0x5F21F2F: PCSetUp_ILU (ilu.c:162)
> ==22498== by 0x604898A: PCSetUp (precon.c:924)
> ==22498== by 0x6189005: KSPSetUp (itfunc.c:379)
> ==22498== by 0x618AB57: KSPSolve (itfunc.c:599)
> ==22498== by 0x5FD4816: PCApply_ASM (asm.c:485)
> ==22498== by 0x604204C: PCApply (precon.c:458)
> ==22498== by 0x6055C76: pcapply_ (preconf.c:223)
> ==22498== by 0x42F500: __cpr_linsolver_MOD_cprapply (cpr_linsolver.F90:419)
> ==22498== by 0x5F42431: ourshellapply (zshellpcf.c:41)
> ==22498== by 0x5F3697A: PCApply_Shell (shellpc.c:115)
> ==22498== by 0x604204C: PCApply (precon.c:458)
> ==22498== by 0x61B74E7: KSP_PCApply (kspimpl.h:251)
> ==22498== by 0x61B83C3: KSPInitialResidual (itres.c:67)
> ==22498== by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44)
> ==22498== by 0x618B77E: KSPSolve (itfunc.c:656)
> ==22498== by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224)
> ==22498== by 0x6245706: SNESSolve (snes.c:3967)
> ==22498== by 0x6265A58: snessolve_ (zsnesf.c:167)
> ==22498== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==22498==
>
> PETSc version: this is from include/petscversion.h:
> #define PETSC_VERSION_RELEASE 0
> #define PETSC_VERSION_MAJOR 3
> #define PETSC_VERSION_MINOR 7
> #define PETSC_VERSION_SUBMINOR 5
> #define PETSC_VERSION_PATCH 0
> #define PETSC_RELEASE_DATE "Apr, 25, 2016"
> #define PETSC_VERSION_DATE "unknown"
>
> This is the recommended version of PETSc for using with PFLOTRAN:
> http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install
>
> Exact debugger output:
> It's a graphical debugger so there isn't much to copy/paste.
> The exact message is:
>
> Memory error detected in MatDestroy_MPIBAIJ_MatGetSubmatrices (baijov.c:609):
>
> null pointer dereference or unaligned memory access.
>
> I can provide screenshots if that would help.
>
> -ksp_view_pre:
> I tried this, it doesn't seem to give information about the KSPs in question. To be clear, this is
> part of an attempt to implement the two stage CPR-AMG preconditioner in PFLOTRAN, so the
> KSP and PC objects involved are:
>
> KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS),
> which has a PC:
> PC: shell, the CPR implementation, which calls two more preconditioners, T1 and T2, in sequence:
> T1: another shell, which calls a KSP (GMRES), which has a PC which is HYPRE BOOMERAMG
> T2: ASM, this is the problematic one.
>
> -ksp_view_pre doesn't seem to give us any information about the ASM preconditioner object
> or it's ILU sub-KSPs; presumably it crashes before getting there. We do get a lot of output about
> T1, for example:
>
> KSP Object: T1 24 MPI processes
> type: gmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: 24 MPI processes
> type: hypre
> PC has not been set up so information may be incomplete
> HYPRE BoomerAMG preconditioning
> HYPRE BoomerAMG: Cycle type V
> HYPRE BoomerAMG: Maximum number of levels 25
> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
> HYPRE BoomerAMG: Threshold for strong coupling 0.25
> HYPRE BoomerAMG: Interpolation truncation factor 0.
> HYPRE BoomerAMG: Interpolation: max elements per row 0
> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> HYPRE BoomerAMG: Maximum row sums 0.9
> HYPRE BoomerAMG: Sweeps down 1
> HYPRE BoomerAMG: Sweeps up 1
> HYPRE BoomerAMG: Sweeps on coarse 1
> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
> HYPRE BoomerAMG: Relax weight (all) 1.
> HYPRE BoomerAMG: Outer relax weight (all) 1.
> HYPRE BoomerAMG: Using CF-relaxation
> HYPRE BoomerAMG: Not using more complex smoothers.
> HYPRE BoomerAMG: Measure type local
> HYPRE BoomerAMG: Coarsen type Falgout
> HYPRE BoomerAMG: Interpolation type classical
> linear system matrix = precond matrix:
> Mat Object: 24 MPI processes
> type: mpiaij
> rows=1122000, cols=1122000
> total: nonzeros=7780000, allocated nonzeros=7780000
> total number of mallocs used during MatSetValues calls =0
>
> Thanks,
>
> Daniel Stone
>
>
> On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
> First run under valgrind. https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
> If that doesn't help send the exact output from the debugger (cut and paste) and the exact version of PETSc you are using.
> Also out put from -ksp_view_pre
>
> Barry
>
> > On Nov 24, 2017, at 8:03 AM, Daniel Stone <daniel.stone at opengosim.com> wrote:
> >
> > Hello,
> >
> > I'm getting a memory exception crash every time I try to run the ASM preconditioner in parallel, can anyone help?
> >
> > I'm using a debugger so I can give most of the stack:
> >
> > PCApply_ASM (asm.c:line 485)
> > KSPSolve (itfunc.c:line 599)
> > KSPSetUp (itfunc.c:line 379)
> > PCSetUp (precon.c: 924)
> > PCSetUp_ILU (ilu.c:line 162)
> > MatDestroy (matrix.c:line 1168)
> > MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609)
> >
> >
> > The problem line is then in MatDestroy_MPIBAIJ_MatGetSubMatrices,
> > in the file baijov.c, line 609:
> >
> > if (!submatj->id) {
> >
> > At this point submatj has no value, address 0x0, and so the attempt to access submatj->id
> > causes the memory error. We can see in the lines just above 609 where submatj is supposed to
> > come from, it should basically be an attribute of C->data, where C is the input matrix.
> >
> > Does anyone have any ideas where to start with getting this to work? I can provide a lot more information
> > from the debugger if need.
> >
> > Many thanks in advance,
> >
> > Daniel Stone
> >
>
>
More information about the petsc-users
mailing list