[petsc-users] crash with PCASM in parallel

Smith, Barry F. bsmith at mcs.anl.gov
Sat Nov 25 16:03:09 CST 2017


  I cannot find that routine

~/Src/petsc ((v3.6.4)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.6.4)) arch-basic
$ git checkout v3.7.5
Previous HEAD position was 401b1b531b... Increase patchlevel to 3.6.4
HEAD is now at b827f1350a... Increase patchlevel to 3.7.5
~/Src/petsc ((v3.7.5)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.7.5)) arch-basic
$ git checkout v3.7.0
error: pathspec 'v3.7.0' did not match any file(s) known to git.
~/Src/petsc ((v3.7.5)) arch-basic
$ git checkout v3.7
Previous HEAD position was b827f1350a... Increase patchlevel to 3.7.5
HEAD is now at ae618e6989... release: set v3.7 strings
~/Src/petsc ((v3.7)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.7)) arch-basic
$ git checkout v3.8
Previous HEAD position was ae618e6989... release: set v3.7 strings
HEAD is now at 0e50f9e530... release: set v3.8 strings
~/Src/petsc ((v3.8)) arch-basic
$ git grep MatDestroy_MPIBAIJ_MatGetSubmatrices
~/Src/petsc ((v3.8)) arch-basic



Are you using a the PETSc git repository and some particular branch or commit in it?


> On Nov 25, 2017, at 11:09 AM, Daniel Stone <daniel.stone at opengosim.com> wrote:
> 
> Thanks for the quick response. 
> 
> I tried Valgrind. Apart from a couple of other warnings in other parts of my code, now fixed, it shows the same stack I described:
> ==22498== Invalid read of size 4
> ==22498==    at 0x55A5BFF: MatDestroy_MPIBAIJ_MatGetSubmatrices (baijov.c:609)
> ==22498==    by 0x538A206: MatDestroy (matrix.c:1168)
> ==22498==    by 0x5F21F2F: PCSetUp_ILU (ilu.c:162)
> ==22498==    by 0x604898A: PCSetUp (precon.c:924)
> ==22498==    by 0x6189005: KSPSetUp (itfunc.c:379)
> ==22498==    by 0x618AB57: KSPSolve (itfunc.c:599)
> ==22498==    by 0x5FD4816: PCApply_ASM (asm.c:485)
> ==22498==    by 0x604204C: PCApply (precon.c:458)
> ==22498==    by 0x6055C76: pcapply_ (preconf.c:223)
> ==22498==    by 0x42F500: __cpr_linsolver_MOD_cprapply (cpr_linsolver.F90:419)
> ==22498==    by 0x5F42431: ourshellapply (zshellpcf.c:41)
> ==22498==    by 0x5F3697A: PCApply_Shell (shellpc.c:115)
> ==22498==    by 0x604204C: PCApply (precon.c:458)
> ==22498==    by 0x61B74E7: KSP_PCApply (kspimpl.h:251)
> ==22498==    by 0x61B83C3: KSPInitialResidual (itres.c:67)
> ==22498==    by 0x6104EF9: KSPSolve_BCGS (bcgs.c:44)
> ==22498==    by 0x618B77E: KSPSolve (itfunc.c:656)
> ==22498==    by 0x62BB02D: SNESSolve_NEWTONLS (ls.c:224)
> ==22498==    by 0x6245706: SNESSolve (snes.c:3967)
> ==22498==    by 0x6265A58: snessolve_ (zsnesf.c:167)
> ==22498==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==22498== 
> 
> PETSc version: this is from include/petscversion.h:
> #define PETSC_VERSION_RELEASE    0
> #define PETSC_VERSION_MAJOR      3
> #define PETSC_VERSION_MINOR      7
> #define PETSC_VERSION_SUBMINOR   5
> #define PETSC_VERSION_PATCH      0
> #define PETSC_RELEASE_DATE       "Apr, 25, 2016"
> #define PETSC_VERSION_DATE       "unknown"
> 
> This is the recommended version of PETSc for using with PFLOTRAN:
> http://documentation.pflotran.org/user_guide/how_to/installation/linux.html#linux-install
> 
> Exact debugger output:
> It's a graphical debugger so there isn't much to copy/paste.
> The exact message is:
> 
> Memory error detected in MatDestroy​_MPIBAIJ​_MatGetSubmatrices ​(baijov.c:609)​:
> 
> null pointer dereference or unaligned memory access.
> 
> I can provide screenshots if that would help. 
> 
> -ksp_view_pre:
> I tried this, it doesn't seem to give information about the KSPs in question. To be clear, this is
> part of an attempt to implement the two stage CPR-AMG preconditioner in PFLOTRAN, so the
> KSP and PC objects involved are:
> 
> KSP: linear solver inside a SNES, inside PFLOTRAN (BCGS),
> which has a PC:
>    PC: shell, the CPR implementation, which calls two more preconditioners, T1 and T2, in sequence:
>       T1: another shell, which calls a KSP (GMRES), which has a PC which is HYPRE BOOMERAMG
>       T2: ASM, this is the problematic one.
> 
> -ksp_view_pre doesn't seem to give us any information about the ASM preconditioner object
> or it's ILU sub-KSPs; presumably it crashes before getting there. We do get a lot of output about
> T1, for example:
> 
> KSP Object: T1 24 MPI processes
>   type: gmres
>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>     GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
>   using DEFAULT norm type for convergence test
> PC Object: 24 MPI processes
>   type: hypre
>   PC has not been set up so information may be incomplete
>     HYPRE BoomerAMG preconditioning
>     HYPRE BoomerAMG: Cycle type V
>     HYPRE BoomerAMG: Maximum number of levels 25
>     HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>     HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
>     HYPRE BoomerAMG: Threshold for strong coupling 0.25
>     HYPRE BoomerAMG: Interpolation truncation factor 0.
>     HYPRE BoomerAMG: Interpolation: max elements per row 0
>     HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>     HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>     HYPRE BoomerAMG: Maximum row sums 0.9
>     HYPRE BoomerAMG: Sweeps down         1
>     HYPRE BoomerAMG: Sweeps up           1
>     HYPRE BoomerAMG: Sweeps on coarse    1
>     HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>     HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>     HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>     HYPRE BoomerAMG: Relax weight  (all)      1.
>     HYPRE BoomerAMG: Outer relax weight (all) 1.
>     HYPRE BoomerAMG: Using CF-relaxation
>     HYPRE BoomerAMG: Not using more complex smoothers.
>     HYPRE BoomerAMG: Measure type        local
>     HYPRE BoomerAMG: Coarsen type        Falgout
>     HYPRE BoomerAMG: Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 24 MPI processes
>     type: mpiaij
>     rows=1122000, cols=1122000
>     total: nonzeros=7780000, allocated nonzeros=7780000
>     total number of mallocs used during MatSetValues calls =0
> 
> Thanks,
> 
> Daniel Stone
> 
> 
> On Fri, Nov 24, 2017 at 4:08 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   First run under valgrind.   https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> 
>    If that doesn't help send the exact output from the debugger (cut and paste) and the exact version of PETSc you are using.
> Also out put from -ksp_view_pre
> 
>   Barry
> 
> > On Nov 24, 2017, at 8:03 AM, Daniel Stone <daniel.stone at opengosim.com> wrote:
> >
> > Hello,
> >
> > I'm getting a memory exception crash every time I try to run the ASM preconditioner in parallel, can anyone help?
> >
> > I'm using a debugger so I can give most of the stack:
> >
> > PCApply_ASM (asm.c:line 485)
> >   KSPSolve (itfunc.c:line 599)
> >     KSPSetUp (itfunc.c:line 379)
> >        PCSetUp (precon.c: 924)
> >           PCSetUp_ILU (ilu.c:line 162)
> >             MatDestroy  (matrix.c:line 1168)
> >               MatDestroy_MPIBAIJ_MatGetSubMatrices (baijov.c:line 609)
> >
> >
> > The problem line is then in  MatDestroy_MPIBAIJ_MatGetSubMatrices,
> > in the file baijov.c, line 609:
> >
> > if (!submatj->id) {
> >
> > At this point submatj has no value, address 0x0, and so the attempt to access submatj->id
> > causes the memory error. We can see in the lines just above 609 where submatj is supposed to
> > come from, it should basically be an attribute of C->data, where C is the input matrix.
> >
> > Does anyone have any ideas where to start with getting this to work? I can provide a lot more information
> > from the debugger if need.
> >
> > Many thanks in advance,
> >
> > Daniel Stone
> >
> 
> 



More information about the petsc-users mailing list