[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver
Chang Liu
cliu at pppl.gov
Wed Oct 13 11:16:47 CDT 2021
Hi Mark,
'-mat_type aijcusparse' works with mpiaijcusparse with other solvers,
but with -pc_factor_mat_solver_type cusparse, it will give an error.
Yes what I want is to have mumps or superlu to do the factorization, and
then do the rest, including GMRES solver, on gpu. Is that possible?
I have tried to use aijcusparse with superlu_dist, it runs but the
iterative solver is still running on CPUs. I have contacted the superlu
group and they confirmed that is the case right now. But if I set
-pc_factor_mat_solver_type cusparse, it seems that the iterative solver
is running on GPU.
Chang
On 10/13/21 12:03 PM, Mark Adams wrote:
>
>
> On Wed, Oct 13, 2021 at 11:10 AM Chang Liu <cliu at pppl.gov
> <mailto:cliu at pppl.gov>> wrote:
>
> Thank you Junchao for explaining this. I guess in my case the code is
> just calling a seq solver like superlu to do factorization on GPUs.
>
> My idea is that I want to have a traditional MPI code to utilize GPUs
> with cusparse. Right now cusparse does not support mpiaij matrix,
>
>
> Sure it does: '-mat_type aijcusparse' will give you an
> mpiaijcusparse matrix with > 1 processes.
> (-mat_type mpiaijcusparse might also work with >1 proc).
>
> However, I see in grepping the repo that all the mumps and superlu tests
> use aij or sell matrix type.
> MUMPS and SuperLU provide their own solves, I assume .... but you might
> want to do other matrix operations on the GPU. Is that the issue?
> Did you try -mat_type aijcusparse with MUMPS and/or SuperLU have a
> problem? (no test with it so it probably does not work)
>
> Thanks,
> Mark
>
> so I
> want the code to have a mpiaij matrix when adding all the matrix terms,
> and then transform the matrix to seqaij when doing the factorization
> and
> solve. This involves sending the data to the master process, and I
> think
> the petsc mumps solver have something similar already.
>
> Chang
>
> On 10/13/21 10:18 AM, Junchao Zhang wrote:
> >
> >
> >
> > On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfadams at lbl.gov
> <mailto:mfadams at lbl.gov>
> > <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>> wrote:
> >
> >
> >
> > On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <cliu at pppl.gov
> <mailto:cliu at pppl.gov>
> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> wrote:
> >
> > Hi Mark,
> >
> > The option I use is like
> >
> > -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres
> -mat_type
> > aijcusparse *-sub_pc_factor_mat_solver_type cusparse
> *-sub_ksp_type
> > preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300
> > -ksp_atol 1.e-300
> >
> >
> > Note, If you use -log_view the last column (rows are the
> method like
> > MatFactorNumeric) has the percent of work in the GPU.
> >
> > Junchao: *This* implies that we have a cuSparse LU
> factorization. Is
> > that correct? (I don't think we do)
> >
> > No, we don't have cuSparse LU factorization. If you check
> > MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls
> > MatLUFactorSymbolic_SeqAIJ() instead.
> > So I don't understand Chang's idea. Do you want to make bigger
> blocks?
> >
> >
> > I think this one do both factorization and solve on gpu.
> >
> > You can check the runex72_aijcusparse.sh file in petsc
> install
> > directory, and try it your self (this is only lu
> factorization
> > without
> > iterative solve).
> >
> > Chang
> >
> > On 10/12/21 1:17 PM, Mark Adams wrote:
> > >
> > >
> > > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu
> <cliu at pppl.gov <mailto:cliu at pppl.gov>
> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> > > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>> wrote:
> > >
> > > Hi Junchao,
> > >
> > > No I only needs it to be transferred within a
> node. I use
> > block-Jacobi
> > > method and GMRES to solve the sparse matrix, so each
> > direct solver will
> > > take care of a sub-block of the whole matrix. In this
> > way, I can use
> > > one
> > > GPU to solve one sub-block, which is stored within
> one node.
> > >
> > > It was stated in the documentation that cusparse
> solver
> > is slow.
> > > However, in my test using ex72.c, the cusparse
> solver is
> > faster than
> > > mumps or superlu_dist on CPUs.
> > >
> > >
> > > Are we talking about the factorization, the solve, or
> both?
> > >
> > > We do not have an interface to cuSparse's LU
> factorization (I
> > just
> > > learned that it exists a few weeks ago).
> > > Perhaps your fast "cusparse solver" is '-pc_type lu
> -mat_type
> > > aijcusparse' ? This would be the CPU factorization,
> which is the
> > > dominant cost.
> > >
> > >
> > > Chang
> > >
> > > On 10/12/21 10:24 AM, Junchao Zhang wrote:
> > > > Hi, Chang,
> > > > For the mumps solver, we usually transfers
> matrix
> > and vector
> > > data
> > > > within a compute node. For the idea you
> propose, it
> > looks like
> > > we need
> > > > to gather data within MPI_COMM_WORLD, right?
> > > >
> > > > Mark, I remember you said cusparse solve is
> slow
> > and you would
> > > > rather do it on CPU. Is it right?
> > > >
> > > > --Junchao Zhang
> > > >
> > > >
> > > > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via
> petsc-users
> > > > <petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>
> > <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>> <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>
> > <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>>>
> > > <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>
> > <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>> <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>
> > <mailto:petsc-users at mcs.anl.gov
> <mailto:petsc-users at mcs.anl.gov>>>>>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Currently, it is possible to use mumps
> solver in
> > PETSC with
> > > > -mat_mumps_use_omp_threads option, so that
> > multiple MPI
> > > processes will
> > > > transfer the matrix and rhs data to the master
> > rank, and then
> > > master
> > > > rank will call mumps with OpenMP to solve
> the matrix.
> > > >
> > > > I wonder if someone can develop similar
> option for
> > cusparse
> > > solver.
> > > > Right now, this solver does not work with
> > mpiaijcusparse. I
> > > think a
> > > > possible workaround is to transfer all the
> matrix
> > data to one MPI
> > > > process, and then upload the data to GPU to
> solve.
> > In this
> > > way, one can
> > > > use cusparse solver for a MPI program.
> > > >
> > > > Chang
> > > > --
> > > > Chang Liu
> > > > Staff Research Physicist
> > > > +1 609 243 3438
> > > > cliu at pppl.gov <mailto:cliu at pppl.gov>
> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> > > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> > > > Princeton Plasma Physics Laboratory
> > > > 100 Stellarator Rd, Princeton NJ 08540, USA
> > > >
> > >
> > > --
> > > Chang Liu
> > > Staff Research Physicist
> > > +1 609 243 3438
> > > cliu at pppl.gov <mailto:cliu at pppl.gov>
> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>> <mailto:cliu at pppl.gov
> <mailto:cliu at pppl.gov>
> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> > > Princeton Plasma Physics Laboratory
> > > 100 Stellarator Rd, Princeton NJ 08540, USA
> > >
> >
> > --
> > Chang Liu
> > Staff Research Physicist
> > +1 609 243 3438
> > cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
> <mailto:cliu at pppl.gov>>
> > Princeton Plasma Physics Laboratory
> > 100 Stellarator Rd, Princeton NJ 08540, USA
> >
>
> --
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> cliu at pppl.gov <mailto:cliu at pppl.gov>
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA
>
--
Chang Liu
Staff Research Physicist
+1 609 243 3438
cliu at pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA
More information about the petsc-users
mailing list