[petsc-users] CG+GAMG convergence issues in GHEP Krylov-Schur for some MPI runs
Matthew Knepley
knepley at gmail.com
Tue Nov 3 06:31:46 CST 2015
On Tue, Nov 3, 2015 at 2:07 AM, Denis Davydov <davydden at gmail.com> wrote:
> Dear all,
>
> I experience strange convergence problems in SLEPc for GHEP with
> Krylov-Schur and CG + GAMG.
> The issue appears to be contingent on the number of MPI cores used.
> Say for 8 cores there is no issue and for 4 cores there is an issue.
> When I substitute GAMG with Jacobi for the problematic number of cores --
> all works.
>
> To be more specific, I solve Ax=\lambda Bx for a sequence of A’s where A
> is a function of eigenvectors.
> On each iteration step currently the eigensolver EPS is initialised from
> scratch. And thus should the underlying
> ST, KSP, PC objects: -st_ksp_type cg -st_pc_type gamg -st_ksp_rtol 1e-12.
> For these particular matrices the issue appears on the 4th iteration, even
> though the matrix to be inverted (mass/overlap matrix)
> is the same, does not change!!!'
>
I assume the issue is the SEGV below? I agree with Jose that you need to
run valgrind. An SEGV can result from
memory corruption in a distant part of the code. This seems very likely to
me since it is the same matrix coming in.
Thanks,
Matt
> From my debuging info the A matrix has the same norm for CG + GAMG vs CG +
> Jacobi cases:
> DEAL:: frobenius_norm = 365.7
> DEAL:: linfty_norm = 19.87
> DEAL:: l1_norm = 19.87
> Just to be sure that there are no bugs on my side which would result in
> different mass matrices i check that
> it has the same norm for CG + GAMG vs CG + Jacobi BEFORE i start iteration:
> DEAL:: frobenius_norm = 166.4
> DEAL:: linfty_norm = 8.342
> DEAL:: l1_norm = 8.342
> All the dependent scalar quantities I calculate on each iteration are
> identical for the two cases, which makes me believe that
> the solution path is the same up to the certain tolerance.
> The only output which is slightly different are the number iterations for
> convergence in EPS (e.g. 113 vs 108) and the
> resulting maxing EPSComputeResidualNorm : 4.1524e-07 vs 2.9639e-08.
>
>
> Any ideas what could be an issue, especially given the fact that it does
> work for some numbers of cores and does not for other?
> Perhaps some default settings in GAMG preconditioner? Although that does
> not explain why it works for the first 3 iterations
> and does not on 4th as the mass matrix is unchanged...
>
> Lastly, i suppose ideally i should keep the eigensolver context between
> the iterations and just update the matrices by EPSSetOperators.
> Is it correct to assume that since B matrix does not change between
> iterations and I use the default shift transformation with zero shift
> (operator is B^{-1)A ), the GAMG preconditioner will not be re-initialised
> and thus I should save some time?
>
> p.s. the relevant error message is below. I have the same issues on CentOS
> cluster, so it is not related to OS-X.
>
> Kind regards,
> Denis
>
>
> ===
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
> batch system) has told this process to end
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: INSTEAD the line number of the start of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: [0] KSPSolve line 510
> /private/tmp/petsc20151102-50378-1t7b3in/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: [0] STMatSolve line 148
> /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsles.c
> [0]PETSC ERROR: [0] STApply_Shift line 33
> /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/impls/shift/shift.c
> [0]PETSC ERROR: [0] STApply line 50
> /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/sys/classes/st/interface/stsolve.c
> [0]PETSC ERROR: [0] EPSGetStartVector line 726
> /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c
> [0]PETSC ERROR: [0] EPSSolve_KrylovSchur_Symm line 41
> /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/impls/krylov/krylovschur/ks-symm.c
> [0]PETSC ERROR: [0] EPSSolve line 83
> /private/tmp/slepc20151102-3081-1xln4h0/slepc-3.6.1/src/eps/interface/epssolve.c
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015
> [0]PETSC ERROR:
> /Users/davydden/Desktop/work/C++/deal.ii-dft/build_debug~/dft on a real
> named MBP-Denis.fritz.box by davydden Tue Nov 3 07:02:47 2015
> [0]PETSC ERROR: Configure options CC=/usr/local/bin/mpicc
> CXX=/usr/local/bin/mpicxx F77=/usr/local/bin/mpif77
> FC=/usr/local/bin/mpif90 --with-shared-libraries=1 --with-pthread=0
> --with-openmp=0 --with-debugging=1 --with-ssl=0
> --with-superlu_dist-include=/usr/local/opt/superlu_dist/include/superlu_dist
> --with-superlu_dist-lib="-L/usr/local/opt/superlu_dist/lib -lsuperlu_dist"
> --with-superlu-include=/usr/local/Cellar/superlu43/4.3/include/superlu
> --with-superlu-lib="-L/usr/local/Cellar/superlu43/4.3/lib -lsuperlu"
> --with-fftw-dir=/usr/local/opt/fftw --with-netcdf-dir=/usr/local/opt/netcdf
> --with-suitesparse-dir=/usr/local/opt/suite-sparse
> --with-hdf5-dir=/usr/local/opt/hdf5 --with-metis-dir=/usr/local/opt/metis
> --with-parmetis-dir=/usr/local/opt/parmetis
> --with-scalapack-dir=/usr/local/opt/scalapack
> --with-mumps-dir=/usr/local/opt/mumps --with-x=0
> --prefix=/usr/local/Cellar/petsc/3.6.2/real --with-scalar-type=real
> --with-hypre-dir=/usr/local/opt/hypre
> --with-sundials-dir=/usr/local/opt/sundials
> --with-hwloc-dir=/usr/local/opt/hwloc
> [0]PETSC ERROR: #8 User provided function() line 0 in unknown file
> --------------------------------------------------------------------------
> mpirun noticed that process rank 3 with PID 96754 on node MBP-Denis exited
> on signal 6 (Abort trap: 6).
> --------------------------------------------------------------------------
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151103/22dab4ef/attachment.html>
More information about the petsc-users
mailing list