[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Mark Adams mfadams at lbl.gov
Tue Oct 12 13:06:52 CDT 2021


On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <cliu at pppl.gov> wrote:

> Hi Mark,
>
> The option I use is like
>
> -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type
> aijcusparse *-sub_pc_factor_mat_solver_type cusparse *-sub_ksp_type
> preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300 -ksp_atol
> 1.e-300
>
>
Note, If you use -log_view the last column (rows are the method like
MatFactorNumeric) has the percent of work in the GPU.

Junchau:  *This* implies that we have a cuSparse LU factorization. Is that
correct? (I don't think we do)

I think this one do both factorization and solve on gpu.
>
> You can check the runex72_aijcusparse.sh file in petsc install
> directory, and try it your self (this is only lu factorization without
> iterative solve).
>
> Chang
>
> On 10/12/21 1:17 PM, Mark Adams wrote:
> >
> >
> > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <cliu at pppl.gov
> > <mailto:cliu at pppl.gov>> wrote:
> >
> >     Hi Junchao,
> >
> >     No I only needs it to be transferred within a node. I use
> block-Jacobi
> >     method and GMRES to solve the sparse matrix, so each direct solver
> will
> >     take care of a sub-block of the whole matrix. In this way, I can use
> >     one
> >     GPU to solve one sub-block, which is stored within one node.
> >
> >     It was stated in the documentation that cusparse solver is slow.
> >     However, in my test using ex72.c, the cusparse solver is faster than
> >     mumps or superlu_dist on CPUs.
> >
> >
> > Are we talking about the factorization, the solve, or both?
> >
> > We do not have an interface to cuSparse's LU factorization (I just
> > learned that it exists a few weeks ago).
> > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type
> > aijcusparse' ? This would be the CPU factorization, which is the
> > dominant cost.
> >
> >
> >     Chang
> >
> >     On 10/12/21 10:24 AM, Junchao Zhang wrote:
> >      > Hi, Chang,
> >      >     For the mumps solver, we usually transfers matrix and vector
> >     data
> >      > within a compute node.  For the idea you propose, it looks like
> >     we need
> >      > to gather data within MPI_COMM_WORLD, right?
> >      >
> >      >     Mark, I remember you said cusparse solve is slow and you would
> >      > rather do it on CPU. Is it right?
> >      >
> >      > --Junchao Zhang
> >      >
> >      >
> >      > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users
> >      > <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
> >     <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>
> >     wrote:
> >      >
> >      >     Hi,
> >      >
> >      >     Currently, it is possible to use mumps solver in PETSC with
> >      >     -mat_mumps_use_omp_threads option, so that multiple MPI
> >     processes will
> >      >     transfer the matrix and rhs data to the master rank, and then
> >     master
> >      >     rank will call mumps with OpenMP to solve the matrix.
> >      >
> >      >     I wonder if someone can develop similar option for cusparse
> >     solver.
> >      >     Right now, this solver does not work with mpiaijcusparse. I
> >     think a
> >      >     possible workaround is to transfer all the matrix data to one
> MPI
> >      >     process, and then upload the data to GPU to solve. In this
> >     way, one can
> >      >     use cusparse solver for a MPI program.
> >      >
> >      >     Chang
> >      >     --
> >      >     Chang Liu
> >      >     Staff Research Physicist
> >      >     +1 609 243 3438
> >      > cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
> >     <mailto:cliu at pppl.gov>>
> >      >     Princeton Plasma Physics Laboratory
> >      >     100 Stellarator Rd, Princeton NJ 08540, USA
> >      >
> >
> >     --
> >     Chang Liu
> >     Staff Research Physicist
> >     +1 609 243 3438
> >     cliu at pppl.gov <mailto:cliu at pppl.gov>
> >     Princeton Plasma Physics Laboratory
> >     100 Stellarator Rd, Princeton NJ 08540, USA
> >
>
> --
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> cliu at pppl.gov
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211012/e427e3e4/attachment-0001.html>


More information about the petsc-users mailing list