[petsc-users] [SLEPc] Performance of Krylov-Schur with MUMPS-based shift-and-invert

Mon Feb 19 12:36:18 CST 2018

> El 19 feb 2018, a las 19:15, Thibaut Appel <t.appel17 at imperial.ac.uk> escribió:
> 
> Good afternoon,
> 
> I am solving generalized eigenvalue problems {Ax = omegaBx} in complex arithmetic, where A is non-hermitian and B is singular. I think the only way to get round the singularity is to employ a shift-and-invert method, where I am using MUMPS to invert the shifted matrix.
> 
> I am using the Fortran interface of PETSc 3.8.3 and SLEPc 3.8.2 where my ./configure line was
> ./configure --with-fortran-kernels=1 --with-scalar-type=complex --with-blaslapack-dir=/home/linuxbrew/.linuxbrew/opt/openblas --PETSC_ARCH=cplx_dble_optim --with-cmake-dir=/home/linuxbrew/.linuxbrew/opt/cmake --with-mpi-dir=/home/linuxbrew/.linuxbrew/opt/openmpi --with-debugging=0 --download-scalapack --download-mumps --COPTFLAGS="-O3 -march=native" --CXXOPTFLAGS="-O3 -march=native" --FOPTFLAGS="-O3 -march=native"
> 
> My matrices A and B are assembled correctly in parallel and my preallocation is quasi-optimal in the sense that I don't have any called to mallocs but I may overestimate the required memory for some rows of the matrices. Here is how I setup the EPS problem and solve:
> 
>     CALL EPSSetProblemType(eps,EPS_GNHEP,ierr)
>     CALL EPSSetOperators(eps,MatA,MatB,ierr)
>     CALL EPSSetType(eps,EPSKRYLOVSCHUR,ierr)
>     CALL EPSSetDimensions(eps,nev,ncv,PETSC_DECIDE,ierr)
>     CALL EPSSetTolerances(eps,tol_ev,PETSC_DECIDE,ierr)
> 
>     CALL EPSSetFromOptions(eps,ierr)
>     CALL EPSSetTarget(eps,shift,ierr)
>     CALL EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE,ierr)
> 
>     CALL EPSGetST(eps,st,ierr)
>     CALL STGetKSP(st,ksp,ierr)
>     CALL KSPGetPC(ksp,pc,ierr)
> 
>     CALL STSetType(st,STSINVERT,ierr)
>     CALL KSPSetType(ksp,KSPPREONLY,ierr)
>     CALL PCSetType(pc,PCLU,ierr)
> 
>     CALL PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,ierr)
>     CALL PCSetFromOptions(pc,ierr)
> 
>     CALL EPSSolve(eps,ierr)
>     CALL EPSGetIterationNumber(eps,iter,ierr)
>     CALL EPSGetConverged(eps,nev_conv,ierr)

The settings seem ok. You can use -eps_view to make sure that everything is set as you want.

> 
>     - Using one MPI process, it takes 1 hour and 22 minutes to retrieve 250 eigenvalues with a Krylov subspace of size 500, a tolerance of 10^-12 when the leading dimension of the matrices is 405000. My matrix A has 98,415,000 non-zero elements and B has 1,215,000 non zero elements. Would you be shocked by that computation time? I would have expected something much lower given the values of nev and ncv I have but could be completely wrong in my understanding of the Krylov-Schur method.

If you run with -log_view you will see the breakup in the different steps. Most probably a large percentage of the time is in the factorization of the matrix (MatLUFactorSym and MatLUFactorNum).

The matrix is quite dense (about 250 nonzero elements per row), so factorizing it is costly. You may want to try inexact shift-and-invert with an iterative method, but you will need a good preconditioner.

The time needed for the other steps may be reduced a little bit by setting a smaller subspace size, for instance with -eps_mpd 200

> 
>     - My goal is speed and reliability. Is there anything you notice in my EPS solver that could be improved or corrected? I remember an exchange with Jose E. Roman where he said that the parameters of MUMPS are not worth being changed, however I notice some people play with the -mat_mumps_cntl_1 and  -mat_mumps_cntl_3 which control the relative/absolute pivoting threshold?

Yes, you can try tuning MUMPS options. Maybe they are relevant in your application.

> 
>     - Would you advise the use of EPSSetTrueResidual and EPSSetBalance since I am using a spectral transformation?

Are you getting large residual norms?
I would not suggest using EPSSetTrueResidual() because it may prevent convergence, especially if the target is not very close to the wanted eigenvalues.
EPSSetBalance() sometimes helps, even in the case of using spectral transformation. It is intended for ill-conditioned problems where the obtained residuals are not so good.

> 
>     - Would you see anything that would prevent me from getting speedup in parallel executions?

I guess it will depend on how MUMPS scales for your problem.

Jose

> 
> Thank you very much in advance and I look forward to exchanging with you about these different points,
> 
> Thibaut
>