[petsc-users] [SLEPc] GD is not deterministic when using different number of cores

Jose E. Roman jroman at dsic.upv.es
Thu Nov 19 04:19:31 CST 2015


> El 19 nov 2015, a las 10:49, Denis Davydov <davydden at gmail.com> escribió:
> 
> Dear all,
> 
> I was trying to get some scaling results for the GD eigensolver as applied to the density functional theory.
> Interestingly enough, the number of self-consistent iterations (solution of couple eigenvalue problem and poisson equations) 
> depends on the number of MPI cores used. For my case the range of iterations is 19-24 for MPI cores between 2 and 160.
> That makes the whole scaling check useless as the eigenproblem is solved different number of times.
> 
> That is **not** the case when I use Krylov-Schur eigensolver with zero shift, which makes me believe that I am missing some settings on GD to make it fully deterministic. The only non-deterministic part I am currently aware of is the initial subspace for the first SC iterations. But that’s the case for both KS and GD. For subsequent iterations I provide previously obtained eigenvectors as initial subspace.
> 
> Certainly there will be some round-off error due to different partition of DoFs for different number of MPI cores, 
> but i don’t expect it to have such a strong influence. Especially given the fact that I don’t see this problem with KS.
> 
> Below is the output of -eps-view for GD with -eps_type gd -eps_harmonic -st_pc_type bjacobi -eps_gd_krylov_start -eps_target -10.0
> I would appreciate any suggestions on how to address the issue.

The block Jacobi preconditioner differs when you change the number of processes. This will probably make GD iterate more when you use more processes.

> 
> As a side question, why GD uses KSP pre-only? It could as well be using a proper linear solver to apply K^{-1} in the expansion state --

You can achieve that with PCKSP. But if you are going to do that, why not using JD instead of GD?


> I assume the Olsen variant is the default in SLEPc?

Yes.

Jose


> 
> Kind regards,
> Denis
> 
> 
> EPS Object: 4 MPI processes
>  type: gd
>    Davidson: search subspace is B-orthogonalized
>    Davidson: block size=1
>    Davidson: type of the initial subspace: Krylov
>    Davidson: size of the subspace after restarting: 6
>    Davidson: number of vectors after restarting from the previous iteration: 0
>  problem type: generalized symmetric eigenvalue problem
>  extraction type: harmonic Ritz
>  selected portion of the spectrum: closest to target: -10 (in magnitude)
>  postprocessing eigenvectors with purification
>  number of eigenvalues (nev): 87
>  number of column vectors (ncv): 175
>  maximum dimension of projected problem (mpd): 175
>  maximum number of iterations: 57575
>  tolerance: 1e-10
>  convergence test: absolute
>  dimension of user-provided initial space: 87
> BV Object: 4 MPI processes
>  type: svec
>  175 columns of global length 57575
>  vector orthogonalization method: classical Gram-Schmidt
>  orthogonalization refinement: if needed (eta: 0.7071)
>  block orthogonalization method: Gram-Schmidt
>  non-standard inner product
>  Mat Object:   4 MPI processes
>    type: mpiaij
>    rows=57575, cols=57575
>    total: nonzeros=1.51135e+06, allocated nonzeros=1.51135e+06
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
>  doing matmult as a single matrix-matrix product
> DS Object: 4 MPI processes
>  type: gnhep
> ST Object: 4 MPI processes
>  type: precond
>  shift: -10
>  number of matrices: 2
>  all matrices have different nonzero pattern
>  KSP Object:  (st_)   4 MPI processes
>    type: preonly
>    maximum iterations=10000, initial guess is zero
>    tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>    left preconditioning
>    using DEFAULT norm type for convergence test
>  PC Object:  (st_)   4 MPI processes
>    type: bjacobi
>      block Jacobi: number of blocks = 4
>      Local solve is same for all blocks, in the following KSP and PC objects:
>    KSP Object:    (st_sub_)     1 MPI processes
>      type: preonly
>      maximum iterations=10000, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>      using NONE norm type for convergence test
>    PC Object:    (st_sub_)     1 MPI processes
>      type: ilu
>        ILU: out-of-place factorization
>        0 levels of fill
>        tolerance for zero pivot 2.22045e-14
>        matrix ordering: natural
>        factor fill ratio given 1, needed 1
>          Factored matrix follows:
>            Mat Object:             1 MPI processes
>              type: seqaij
>              rows=15557, cols=15557
>              package used to perform factorization: petsc
>              total: nonzeros=388947, allocated nonzeros=388947
>              total number of mallocs used during MatSetValues calls =0
>                not using I-node routines
>      linear system matrix = precond matrix:
>      Mat Object:       1 MPI processes
>        type: seqaij
>        rows=15557, cols=15557
>        total: nonzeros=388947, allocated nonzeros=388947
>        total number of mallocs used during MatSetValues calls =0
>          not using I-node routines
>    linear system matrix = precond matrix:
>    Mat Object:     4 MPI processes
>      type: mpiaij
>      rows=57575, cols=57575
>      total: nonzeros=1.51135e+06, allocated nonzeros=1.51135e+06
>      total number of mallocs used during MatSetValues calls =0
>        not using I-node (on process 0) routines
> 
> 
> 



More information about the petsc-users mailing list