[petsc-users] [SLEPc] Number of iterations changes with MPI processes in Lanczos

Jose E. Roman jroman at dsic.upv.es
Wed Oct 24 08:48:15 CDT 2018


Everything seems correct. I don't know, maybe your problem is very sensitive? Is the eigenvalue tiny?
I would still try with Krylov-Schur.
Jose


> El 24 oct 2018, a las 14:59, Ale Foggia <amfoggia at gmail.com> escribió:
> 
> The functions called to set the solver are (in this order): EPSCreate(); EPSSetOperators(); EPSSetProblemType(EPS_HEP); EPSSetType(EPSLANCZOS); EPSSetWhichEigenpairs(EPS_SMALLEST_REAL); EPSSetFromOptions();
> 
> The output of -eps_view for each run is:
> =================================================================
> EPS Object: 960 MPI processes
>   type: lanczos
>     LOCAL reorthogonalization
>   problem type: symmetric eigenvalue problem
>   selected portion of the spectrum: smallest real parts
>   number of eigenvalues (nev): 1
>   number of column vectors (ncv): 16
>   maximum dimension of projected problem (mpd): 16
>   maximum number of iterations: 291700777
>   tolerance: 1e-08
>   convergence test: relative to the eigenvalue
> BV Object: 960 MPI processes
>   type: svec
>   17 columns of global length 2333606220
>   vector orthogonalization method: modified Gram-Schmidt
>   orthogonalization refinement: if needed (eta: 0.7071)
>   block orthogonalization method: GS
>   doing matmult as a single matrix-matrix product
>   generating random vectors independent of the number of processes
> DS Object: 960 MPI processes
>   type: hep
>   parallel operation mode: REDUNDANT
>   solving the problem with: Implicit QR method (_steqr)
> ST Object: 960 MPI processes
>   type: shift
>   shift: 0.
>   number of matrices: 1
> =================================================================
> EPS Object: 1024 MPI processes
>   type: lanczos
>     LOCAL reorthogonalization
>   problem type: symmetric eigenvalue problem
>   selected portion of the spectrum: smallest real parts
>   number of eigenvalues (nev): 1
>   number of column vectors (ncv): 16
>   maximum dimension of projected problem (mpd): 16
>   maximum number of iterations: 291700777
>   tolerance: 1e-08
>   convergence test: relative to the eigenvalue
> BV Object: 1024 MPI processes
>   type: svec
>   17 columns of global length 2333606220
>   vector orthogonalization method: modified Gram-Schmidt
>   orthogonalization refinement: if needed (eta: 0.7071)
>   block orthogonalization method: GS
>   doing matmult as a single matrix-matrix product
>   generating random vectors independent of the number of processes
> DS Object: 1024 MPI processes
>   type: hep
>   parallel operation mode: REDUNDANT
>   solving the problem with: Implicit QR method (_steqr)
> ST Object: 1024 MPI processes
>   type: shift
>   shift: 0.
>   number of matrices: 1
> =================================================================
> 
> I run again the same configurations and I got the same result in term of the number of iterations.
> 
> I also tried the full reorthogonalization (always with the -bv_reproducible_random option) but I still get different number of iterations: for 960 procs I get 172 iters, and for 1024 I get 362 iters. The -esp_view output for this case (only for 960 procs, the other one has the same information -except the number of processes-) is:
> =================================================================
> EPS Object: 960 MPI processes
>   type: lanczos
>     FULL reorthogonalization
>   problem type: symmetric eigenvalue problem
>   selected portion of the spectrum: smallest real parts
>   number of eigenvalues (nev): 1
>   number of column vectors (ncv): 16
>   maximum dimension of projected problem (mpd): 16
>   maximum number of iterations: 291700777
>   tolerance: 1e-08
>   convergence test: relative to the eigenvalue
> BV Object: 960 MPI processes
>   type: svec
>   17 columns of global length 2333606220
>   vector orthogonalization method: classical Gram-Schmidt
>   orthogonalization refinement: if needed (eta: 0.7071)
>   block orthogonalization method: GS
>   doing matmult as a single matrix-matrix product
>   generating random vectors independent of the number of processes
> DS Object: 960 MPI processes
>   type: hep
>   parallel operation mode: REDUNDANT
>   solving the problem with: Implicit QR method (_steqr)
> ST Object: 960 MPI processes
>   type: shift
>   shift: 0.
>   number of matrices: 1
> =================================================================
> 
> El mié., 24 oct. 2018 a las 10:52, Jose E. Roman (<jroman at dsic.upv.es>) escribió:
> This is very strange. Make sure you call EPSSetFromOptions in the code. Do iteration counts change also for two different runs with the same number of processes?
> Maybe Lanczos with default options is too sensitive (by default it does not reorthogonalize). Suggest using Krylov-Schur or Lanczos with full reorthogonalization (EPSLanczosSetReorthog).
> Also, send the output of -eps_view to see if there is anything abnormal.
> 
> Jose
> 
> 
> > El 24 oct 2018, a las 9:09, Ale Foggia <amfoggia at gmail.com> escribió:
> > 
> > I've tried the option that you gave me but I still get different number of iterations when changing the number of MPI processes: I did 960 procs and 1024 procs and I got 435 and 176 iterations, respectively.
> > 
> > El mar., 23 oct. 2018 a las 16:48, Jose E. Roman (<jroman at dsic.upv.es>) escribió:
> > 
> > 
> > > El 23 oct 2018, a las 15:46, Ale Foggia <amfoggia at gmail.com> escribió:
> > > 
> > > 
> > > 
> > > El mar., 23 oct. 2018 a las 15:33, Jose E. Roman (<jroman at dsic.upv.es>) escribió:
> > > 
> > > 
> > > > El 23 oct 2018, a las 15:17, Ale Foggia <amfoggia at gmail.com> escribió:
> > > > 
> > > > Hello Jose, thanks for your answer.
> > > > 
> > > > El mar., 23 oct. 2018 a las 12:59, Jose E. Roman (<jroman at dsic.upv.es>) escribió:
> > > > There is an undocumented option:
> > > > 
> > > >   -bv_reproducible_random
> > > > 
> > > > It will force the initial vector of the Krylov subspace to be the same irrespective of the number of MPI processes. This should be used for scaling analyses as the one you are trying to do.
> > > > 
> > > > What about when I'm not doing the scaling? Now I would like to ask for computing time for bigger size problems, should I also use this option in that case? Because, what happens if I have a "bad" configuration? Meaning, I ask for some time, enough if I take into account the "correct" scaling, but when I run it takes double the time/iterations, like it happened before when changing from 960 to 1024 processes?
> > > 
> > > When you increase the matrix size the spectrum of the matrix changes and probably also the convergence, so the computation time is not easy to predict in advance.
> > > 
> > > Okey, I'll keep that in mine. I thought that, even if the spectrum changes, if I had a behaviour/tendency for 6 or 7 smaller cases I could predict more or less the time. It was working this way until I found this "iterations problem" which doubled the time of execution for the same size problem. To be completely sure, do you suggest me or not to use this run-time option when going in production? Can you elaborate a bit in the effect this option? Is the (huge) difference I got in the number of iterations something expected?
> > 
> > Ideally if you have a rough approximation of the eigenvector, you set it as the initial vector with EPSSetInitialSpace(). Otherwise, SLEPc generates a random initial vector, that is start the search blindly. The difference between using one random vector or another may be large, depending on the problem. Krylov-Schur is usually less sensitive to the initial vector. 
> > 
> > Jose
> > 
> > > 
> > > 
> > > > 
> > > > An additional comment is that we strongly recommend to use the default solver (Krylov-Schur), which will do Lanczos with implicit restart. It is generally faster and more stable.
> > > > 
> > > > I will be doing Dynamical Lanczos, that means that I'll need the "matrix whose rows are the eigenvectors of the tridiagonal matrix" (so, according to the Lanczos Technical Report notation, I need the "matrix whose rows are the eigenvectors of T_m", which should be the same as the vectors y_i). I checked the Technical Report for Krylov-Schur also and I think I can get the same information also from that solver, but I'm not sure. Can you confirm this please? 
> > > > Also, as the vectors I want are given by V_m^(-1)*x_i=y_i (following the notation on the Report), my idea to get them was to retrieve the invariant subspace V_m (with EPSGetInvariantSubspace), invert it, and then multiply it with the eigenvectors that I get with EPSGetEigenvector. Is there another easier (or with less computations) way to get this?
> > > 
> > > In Krylov-Schur the tridiagonal matrix T_m becomes arrowhead-plus-tridiagonal. Apart from this, it should be equivalent. The relevant information can be obtained with EPSGetBV() and EPSGetDS(). But this is a "developer level" interface. We could help you get this running. Send a small problem matrix to slepc-maint together with a more detailed description of what you need to compute.
> > > 
> > > Thanks! When I get to that part I'll write to slepc-maint for help.
> > > 
> > > 
> > > Jose
> > > 
> > > > 
> > > > 
> > > > Jose
> > > > 
> > > > 
> > > > 
> > > > > El 23 oct 2018, a las 12:13, Ale Foggia <amfoggia at gmail.com> escribió:
> > > > > 
> > > > > Hello, 
> > > > > 
> > > > > I'm currently using Lanczos solver (EPSLANCZOS) to get the smallest real eigenvalue (EPS_SMALLEST_REAL) of a Hermitian problem (EPS_HEP). Those are the only options I set for the solver. My aim is to be able to predict/estimate the time-to-solution. To do so, I was doing a scaling of the code for different sizes of matrices and for different number of MPI processes. As I was not observing a good scaling I checked the number of iterations of the solver (given by EPSGetIterationNumber). I've encounter that for the **same size** of matrix (that meaning, the same problem), when I change the number of MPI processes, the amount of iterations changes, and the behaviour is not monotonic. This are the numbers I've got:
> > > > > 
> > > > > # procs   # iters
> > > > > 960          157
> > > > > 992          189
> > > > > 1024        338
> > > > > 1056        190
> > > > > 1120        174
> > > > > 2048        136
> > > > > 
> > > > > I've checked the mailing list for a similar situation and I've found another person with the same problem but in another solver ("[SLEPc] GD is not deterministic when using different number of cores", Nov 19 2015), but I think the solution this person finds does not apply to my problem (removing "-eps_harmonic" option).
> > > > > 
> > > > > Can you give me any hint on what is the reason for this behaviour? Is there a way to prevent this? It's not possible to estimate/predict any time consumption for bigger problems if the number of iterations varies this much.
> > > > > 
> > > > > Ale
> > > > 
> > 
> 



More information about the petsc-users mailing list