[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Tue Oct 12 12:45:08 CDT 2021

Hi Mark,

The option I use is like

-pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type 
aijcusparse -sub_pc_factor_mat_solver_type cusparse -sub_ksp_type 
preonly -sub_pc_type lu -ksp_max_it 2000 -ksp_rtol 1.e-300 -ksp_atol 1.e-300

I think this one do both factorization and solve on gpu.

You can check the runex72_aijcusparse.sh file in petsc install 
directory, and try it your self (this is only lu factorization without 
iterative solve).

Chang

On 10/12/21 1:17 PM, Mark Adams wrote:
> 
> 
> On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <cliu at pppl.gov 
> <mailto:cliu at pppl.gov>> wrote:
> 
>     Hi Junchao,
> 
>     No I only needs it to be transferred within a node. I use block-Jacobi
>     method and GMRES to solve the sparse matrix, so each direct solver will
>     take care of a sub-block of the whole matrix. In this way, I can use
>     one
>     GPU to solve one sub-block, which is stored within one node.
> 
>     It was stated in the documentation that cusparse solver is slow.
>     However, in my test using ex72.c, the cusparse solver is faster than
>     mumps or superlu_dist on CPUs.
> 
> 
> Are we talking about the factorization, the solve, or both?
> 
> We do not have an interface to cuSparse's LU factorization (I just 
> learned that it exists a few weeks ago).
> Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type 
> aijcusparse' ? This would be the CPU factorization, which is the 
> dominant cost.
> 
> 
>     Chang
> 
>     On 10/12/21 10:24 AM, Junchao Zhang wrote:
>      > Hi, Chang,
>      >     For the mumps solver, we usually transfers matrix and vector
>     data
>      > within a compute node.  For the idea you propose, it looks like
>     we need
>      > to gather data within MPI_COMM_WORLD, right?
>      >
>      >     Mark, I remember you said cusparse solve is slow and you would
>      > rather do it on CPU. Is it right?
>      >
>      > --Junchao Zhang
>      >
>      >
>      > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users
>      > <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>     <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>
>     wrote:
>      >
>      >     Hi,
>      >
>      >     Currently, it is possible to use mumps solver in PETSC with
>      >     -mat_mumps_use_omp_threads option, so that multiple MPI
>     processes will
>      >     transfer the matrix and rhs data to the master rank, and then
>     master
>      >     rank will call mumps with OpenMP to solve the matrix.
>      >
>      >     I wonder if someone can develop similar option for cusparse
>     solver.
>      >     Right now, this solver does not work with mpiaijcusparse. I
>     think a
>      >     possible workaround is to transfer all the matrix data to one MPI
>      >     process, and then upload the data to GPU to solve. In this
>     way, one can
>      >     use cusparse solver for a MPI program.
>      >
>      >     Chang
>      >     --
>      >     Chang Liu
>      >     Staff Research Physicist
>      >     +1 609 243 3438
>      > cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>>
>      >     Princeton Plasma Physics Laboratory
>      >     100 Stellarator Rd, Princeton NJ 08540, USA
>      >
> 
>     -- 
>     Chang Liu
>     Staff Research Physicist
>     +1 609 243 3438
>     cliu at pppl.gov <mailto:cliu at pppl.gov>
>     Princeton Plasma Physics Laboratory
>     100 Stellarator Rd, Princeton NJ 08540, USA
> 

-- 
Chang Liu
Staff Research Physicist
+1 609 243 3438
cliu at pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA