[petsc-users] [SLEPc] Number of iterations changes with MPI processes in Lanczos

Wed Oct 24 07:59:02 CDT 2018

The functions called to set the solver are (in this order): EPSCreate();
EPSSetOperators(); EPSSetProblemType(EPS_HEP); EPSSetType(EPSLANCZOS);
EPSSetWhichEigenpairs(EPS_SMALLEST_REAL); EPSSetFromOptions();

The output of -eps_view for each run is:
=================================================================
EPS Object: 960 MPI processes
  type: lanczos
    LOCAL reorthogonalization
  problem type: symmetric eigenvalue problem
  selected portion of the spectrum: smallest real parts
  number of eigenvalues (nev): 1
  number of column vectors (ncv): 16
  maximum dimension of projected problem (mpd): 16
  maximum number of iterations: 291700777
  tolerance: 1e-08
  convergence test: relative to the eigenvalue
BV Object: 960 MPI processes
  type: svec
  17 columns of global length 2333606220
  vector orthogonalization method: modified Gram-Schmidt
  orthogonalization refinement: if needed (eta: 0.7071)
  block orthogonalization method: GS
  doing matmult as a single matrix-matrix product
  generating random vectors independent of the number of processes
DS Object: 960 MPI processes
  type: hep
  parallel operation mode: REDUNDANT
  solving the problem with: Implicit QR method (_steqr)
ST Object: 960 MPI processes
  type: shift
  shift: 0.
  number of matrices: 1
=================================================================
EPS Object: 1024 MPI processes
  type: lanczos
    LOCAL reorthogonalization
  problem type: symmetric eigenvalue problem
  selected portion of the spectrum: smallest real parts
  number of eigenvalues (nev): 1
  number of column vectors (ncv): 16
  maximum dimension of projected problem (mpd): 16
  maximum number of iterations: 291700777
  tolerance: 1e-08
  convergence test: relative to the eigenvalue
BV Object: 1024 MPI processes
  type: svec
  17 columns of global length 2333606220
  vector orthogonalization method: modified Gram-Schmidt
  orthogonalization refinement: if needed (eta: 0.7071)
  block orthogonalization method: GS
  doing matmult as a single matrix-matrix product
  generating random vectors independent of the number of processes
DS Object: 1024 MPI processes
  type: hep
  parallel operation mode: REDUNDANT
  solving the problem with: Implicit QR method (_steqr)
ST Object: 1024 MPI processes
  type: shift
  shift: 0.
  number of matrices: 1
=================================================================

I run again the same configurations and I got the same result in term of
the number of iterations.

I also tried the full reorthogonalization (always with the
-bv_reproducible_random option) but I still get different number of
iterations: for 960 procs I get 172 iters, and for 1024 I get 362 iters.
The -esp_view output for this case (only for 960 procs, the other one has
the same information -except the number of processes-) is:
=================================================================
EPS Object: 960 MPI processes
  type: lanczos
    FULL reorthogonalization
  problem type: symmetric eigenvalue problem
  selected portion of the spectrum: smallest real parts
  number of eigenvalues (nev): 1
  number of column vectors (ncv): 16
  maximum dimension of projected problem (mpd): 16
  maximum number of iterations: 291700777
  tolerance: 1e-08
  convergence test: relative to the eigenvalue
BV Object: 960 MPI processes
  type: svec
  17 columns of global length 2333606220
  vector orthogonalization method: classical Gram-Schmidt
  orthogonalization refinement: if needed (eta: 0.7071)
  block orthogonalization method: GS
  doing matmult as a single matrix-matrix product
  generating random vectors independent of the number of processes
DS Object: 960 MPI processes
  type: hep
  parallel operation mode: REDUNDANT
  solving the problem with: Implicit QR method (_steqr)
ST Object: 960 MPI processes
  type: shift
  shift: 0.
  number of matrices: 1
=================================================================

El mié., 24 oct. 2018 a las 10:52, Jose E. Roman (<jroman at dsic.upv.es>)
escribió:

> This is very strange. Make sure you call EPSSetFromOptions in the code. Do
> iteration counts change also for two different runs with the same number of
> processes?
> Maybe Lanczos with default options is too sensitive (by default it does
> not reorthogonalize). Suggest using Krylov-Schur or Lanczos with full
> reorthogonalization (EPSLanczosSetReorthog).
> Also, send the output of -eps_view to see if there is anything abnormal.
>
> Jose
>
>
> > El 24 oct 2018, a las 9:09, Ale Foggia <amfoggia at gmail.com> escribió:
> >
> > I've tried the option that you gave me but I still get different number
> of iterations when changing the number of MPI processes: I did 960 procs
> and 1024 procs and I got 435 and 176 iterations, respectively.
> >
> > El mar., 23 oct. 2018 a las 16:48, Jose E. Roman (<jroman at dsic.upv.es>)
> escribió:
> >
> >
> > > El 23 oct 2018, a las 15:46, Ale Foggia <amfoggia at gmail.com> escribió:
> > >
> > >
> > >
> > > El mar., 23 oct. 2018 a las 15:33, Jose E. Roman (<jroman at dsic.upv.es>)
> escribió:
> > >
> > >
> > > > El 23 oct 2018, a las 15:17, Ale Foggia <amfoggia at gmail.com>
> escribió:
> > > >
> > > > Hello Jose, thanks for your answer.
> > > >
> > > > El mar., 23 oct. 2018 a las 12:59, Jose E. Roman (<
> jroman at dsic.upv.es>) escribió:
> > > > There is an undocumented option:
> > > >
> > > >   -bv_reproducible_random
> > > >
> > > > It will force the initial vector of the Krylov subspace to be the
> same irrespective of the number of MPI processes. This should be used for
> scaling analyses as the one you are trying to do.
> > > >
> > > > What about when I'm not doing the scaling? Now I would like to ask
> for computing time for bigger size problems, should I also use this option
> in that case? Because, what happens if I have a "bad" configuration?
> Meaning, I ask for some time, enough if I take into account the "correct"
> scaling, but when I run it takes double the time/iterations, like it
> happened before when changing from 960 to 1024 processes?
> > >
> > > When you increase the matrix size the spectrum of the matrix changes
> and probably also the convergence, so the computation time is not easy to
> predict in advance.
> > >
> > > Okey, I'll keep that in mine. I thought that, even if the spectrum
> changes, if I had a behaviour/tendency for 6 or 7 smaller cases I could
> predict more or less the time. It was working this way until I found this
> "iterations problem" which doubled the time of execution for the same size
> problem. To be completely sure, do you suggest me or not to use this
> run-time option when going in production? Can you elaborate a bit in the
> effect this option? Is the (huge) difference I got in the number of
> iterations something expected?
> >
> > Ideally if you have a rough approximation of the eigenvector, you set it
> as the initial vector with EPSSetInitialSpace(). Otherwise, SLEPc generates
> a random initial vector, that is start the search blindly. The difference
> between using one random vector or another may be large, depending on the
> problem. Krylov-Schur is usually less sensitive to the initial vector.
> >
> > Jose
> >
> > >
> > >
> > > >
> > > > An additional comment is that we strongly recommend to use the
> default solver (Krylov-Schur), which will do Lanczos with implicit restart.
> It is generally faster and more stable.
> > > >
> > > > I will be doing Dynamical Lanczos, that means that I'll need the
> "matrix whose rows are the eigenvectors of the tridiagonal matrix" (so,
> according to the Lanczos Technical Report notation, I need the "matrix
> whose rows are the eigenvectors of T_m", which should be the same as the
> vectors y_i). I checked the Technical Report for Krylov-Schur also and I
> think I can get the same information also from that solver, but I'm not
> sure. Can you confirm this please?
> > > > Also, as the vectors I want are given by V_m^(-1)*x_i=y_i (following
> the notation on the Report), my idea to get them was to retrieve the
> invariant subspace V_m (with EPSGetInvariantSubspace), invert it, and then
> multiply it with the eigenvectors that I get with EPSGetEigenvector. Is
> there another easier (or with less computations) way to get this?
> > >
> > > In Krylov-Schur the tridiagonal matrix T_m becomes
> arrowhead-plus-tridiagonal. Apart from this, it should be equivalent. The
> relevant information can be obtained with EPSGetBV() and EPSGetDS(). But
> this is a "developer level" interface. We could help you get this running.
> Send a small problem matrix to slepc-maint together with a more detailed
> description of what you need to compute.
> > >
> > > Thanks! When I get to that part I'll write to slepc-maint for help.
> > >
> > >
> > > Jose
> > >
> > > >
> > > >
> > > > Jose
> > > >
> > > >
> > > >
> > > > > El 23 oct 2018, a las 12:13, Ale Foggia <amfoggia at gmail.com>
> escribió:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I'm currently using Lanczos solver (EPSLANCZOS) to get the
> smallest real eigenvalue (EPS_SMALLEST_REAL) of a Hermitian problem
> (EPS_HEP). Those are the only options I set for the solver. My aim is to be
> able to predict/estimate the time-to-solution. To do so, I was doing a
> scaling of the code for different sizes of matrices and for different
> number of MPI processes. As I was not observing a good scaling I checked
> the number of iterations of the solver (given by EPSGetIterationNumber).
> I've encounter that for the **same size** of matrix (that meaning, the same
> problem), when I change the number of MPI processes, the amount of
> iterations changes, and the behaviour is not monotonic. This are the
> numbers I've got:
> > > > >
> > > > > # procs   # iters
> > > > > 960          157
> > > > > 992          189
> > > > > 1024        338
> > > > > 1056        190
> > > > > 1120        174
> > > > > 2048        136
> > > > >
> > > > > I've checked the mailing list for a similar situation and I've
> found another person with the same problem but in another solver ("[SLEPc]
> GD is not deterministic when using different number of cores", Nov 19
> 2015), but I think the solution this person finds does not apply to my
> problem (removing "-eps_harmonic" option).
> > > > >
> > > > > Can you give me any hint on what is the reason for this behaviour?
> Is there a way to prevent this? It's not possible to estimate/predict any
> time consumption for bigger problems if the number of iterations varies
> this much.
> > > > >
> > > > > Ale
> > > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181024/41633380/attachment-0001.html>