[petsc-users] SLEPc: GPU accelerated shift-invert

Jose E. Roman jroman at dsic.upv.es
Sat Mar 18 06:46:19 CDT 2023

When using an aijcusparse matrix by default it will select the cusparse solver, i.e., as if you have added the option -st_pc_factor_mat_solver_type cusparse 
The problem is that CUSPARSE does not have functionality for computing the LU factorization on the GPU, as far as I know. So what PETSc does is factorize the matrix on the CPU (the largest cost) and then the use the GPU for the triangular solves. In SLEPc computations, the number of triangular solves is usually small, so there is no gain in doing those on the GPU. Furthermore, these flops do not seem to be correctly logged to appear on the GPU side.

Probably someone like Stefano or Junchao can provide more information about factorizations on the GPU.

You could try doing inexact shift-and-invert, i.e., using an iterative linear solver such as bcgs+ilu. In the case of ILU, it is implemented on the GPU with CUSPARSE. However, inexact shift-and-invert is not viable in many applications, depending on the distribution of eigenvalues, due to non-convergence of the KSP.

A final alternative is to avoid shift-and-invert completely and use STFILTER. Again, this will not work in all cases. Basically, it trades a factorization for a huge amount of matrix-vector products, which may be good for GPU computation. If you want, send me a matrix and I can do some tests.


> El 17 mar 2023, a las 21:13, Greg Kahanamoku-Meyer <gregory.meyer at berkeley.edu> escribió:
> Hi,
> I'm trying to accelerate a shift-invert eigensolve with GPU, but the computation seems to be spending a lot of its time in the CPU. Looking at the output with "-log_view -log_view_gpu_time" I see that MatLUFactorNum is not using the GPU (GPU Mflops/s is 0), and is taking the majority of the computation time. Is LU factorization on the GPU supported? I am currently applying the command line options "-vec_type cuda -mat_type aijcusparse", please let me know if there are other options I can apply to accelerate the LU factorization as well. I tried digging through the documentation but couldn't find a clear answer.
> Thanks in advance!
> Kind regards,
> Greg KM
> -- 
> Gregory D. Kahanamoku-Meyer
> PhD Candidate
> quantum computing | cryptography | high-performance computing
> Department of Physics
> University of California at Berkeley
> personal website

More information about the petsc-users mailing list