[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver
Mark Adams
mfadams at lbl.gov
Wed Oct 13 19:29:41 CDT 2021
On Wed, Oct 13, 2021 at 1:53 PM Barry Smith <bsmith at petsc.dev> wrote:
>
> Chang,
>
> You are correct there is no MPI + GPU direct solvers that currently do
> the triangular solves with MPI + GPU parallelism that I am aware of.
So SuperLU and MUMPS do MPI solves on the CPU. That is reasonable. I have
not been able to get decent performance with GPU solves. Complex code and
low AI is not a good fit for GPUs. No work and all latency.
Chang, you would find that GPU solves suck and, anyway, machines these days
are configured with significant (high quality) CPU resources. I think you
would find that you can't get GPU solves to beat CPU solves, except if you
have enormous problems to solve, perhaps.
> You are limited that individual triangular solves be done on a single GPU.
> I can only suggest making each subdomain as big as possible to utilize each
> GPU as much as possible for the direct triangular solves.
>
> Barry
>
>
> > On Oct 13, 2021, at 12:16 PM, Chang Liu via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > Hi Mark,
> >
> > '-mat_type aijcusparse' works with mpiaijcusparse with other solvers,
> but with -pc_factor_mat_solver_type cusparse, it will give an error.
> >
> > Yes what I want is to have mumps or superlu to do the factorization, and
> then do the rest, including GMRES solver, on gpu. Is that possible?
> >
> > I have tried to use aijcusparse with superlu_dist, it runs but the
> iterative solver is still running on CPUs. I have contacted the superlu
> group and they confirmed that is the case right now. But if I set
> -pc_factor_mat_solver_type cusparse, it seems that the iterative solver is
> running on GPU.
> >
> > Chang
> >
> > On 10/13/21 12:03 PM, Mark Adams wrote:
> >> On Wed, Oct 13, 2021 at 11:10 AM Chang Liu <cliu at pppl.gov <mailto:
> cliu at pppl.gov>> wrote:
> >> Thank you Junchao for explaining this. I guess in my case the code is
> >> just calling a seq solver like superlu to do factorization on GPUs.
> >> My idea is that I want to have a traditional MPI code to utilize GPUs
> >> with cusparse. Right now cusparse does not support mpiaij matrix,
> Sure it does: '-mat_type aijcusparse' will give you an mpiaijcusparse
> matrix with > 1 processes.
> >> (-mat_type mpiaijcusparse might also work with >1 proc).
> >> However, I see in grepping the repo that all the mumps and superlu
> tests use aij or sell matrix type.
> >> MUMPS and SuperLU provide their own solves, I assume .... but you might
> want to do other matrix operations on the GPU. Is that the issue?
> >> Did you try -mat_type aijcusparse with MUMPS and/or SuperLU have a
> problem? (no test with it so it probably does not work)
> >> Thanks,
> >> Mark
> >> so I
> >> want the code to have a mpiaij matrix when adding all the matrix
> terms,
> >> and then transform the matrix to seqaij when doing the factorization
> >> and
> >> solve. This involves sending the data to the master process, and I
> >> think
> >> the petsc mumps solver have something similar already.
> >> Chang
> >> On 10/13/21 10:18 AM, Junchao Zhang wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfadams at lbl.gov
> >> <mailto:mfadams at lbl.gov>
> >> > <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>> wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <cliu at pppl.gov
> >> <mailto:cliu at pppl.gov>
> >> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> wrote:
> >> >
> >> > Hi Mark,
> >> >
> >> > The option I use is like
> >> >
> >> > -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres
> >> -mat_type
> >> > aijcusparse *-sub_pc_factor_mat_solver_type cusparse
> >> *-sub_ksp_type
> >> > preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol
> 1.e-300
> >> > -ksp_atol 1.e-300
> >> >
> >> >
> >> > Note, If you use -log_view the last column (rows are the
> >> method like
> >> > MatFactorNumeric) has the percent of work in the GPU.
> >> >
> >> > Junchao: *This* implies that we have a cuSparse LU
> >> factorization. Is
> >> > that correct? (I don't think we do)
> >> >
> >> > No, we don't have cuSparse LU factorization. If you check
> >> > MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls
> >> > MatLUFactorSymbolic_SeqAIJ() instead.
> >> > So I don't understand Chang's idea. Do you want to make bigger
> >> blocks?
> >> >
> >> >
> >> > I think this one do both factorization and solve on gpu.
> >> >
> >> > You can check the runex72_aijcusparse.sh file in petsc
> >> install
> >> > directory, and try it your self (this is only lu
> >> factorization
> >> > without
> >> > iterative solve).
> >> >
> >> > Chang
> >> >
> >> > On 10/12/21 1:17 PM, Mark Adams wrote:
> >> > >
> >> > >
> >> > > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu
> >> <cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >> > > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>> wrote:
> >> > >
> >> > > Hi Junchao,
> >> > >
> >> > > No I only needs it to be transferred within a
> >> node. I use
> >> > block-Jacobi
> >> > > method and GMRES to solve the sparse matrix, so
> each
> >> > direct solver will
> >> > > take care of a sub-block of the whole matrix. In
> this
> >> > way, I can use
> >> > > one
> >> > > GPU to solve one sub-block, which is stored within
> >> one node.
> >> > >
> >> > > It was stated in the documentation that cusparse
> >> solver
> >> > is slow.
> >> > > However, in my test using ex72.c, the cusparse
> >> solver is
> >> > faster than
> >> > > mumps or superlu_dist on CPUs.
> >> > >
> >> > >
> >> > > Are we talking about the factorization, the solve, or
> >> both?
> >> > >
> >> > > We do not have an interface to cuSparse's LU
> >> factorization (I
> >> > just
> >> > > learned that it exists a few weeks ago).
> >> > > Perhaps your fast "cusparse solver" is '-pc_type lu
> >> -mat_type
> >> > > aijcusparse' ? This would be the CPU factorization,
> >> which is the
> >> > > dominant cost.
> >> > >
> >> > >
> >> > > Chang
> >> > >
> >> > > On 10/12/21 10:24 AM, Junchao Zhang wrote:
> >> > > > Hi, Chang,
> >> > > > For the mumps solver, we usually transfers
> >> matrix
> >> > and vector
> >> > > data
> >> > > > within a compute node. For the idea you
> >> propose, it
> >> > looks like
> >> > > we need
> >> > > > to gather data within MPI_COMM_WORLD, right?
> >> > > >
> >> > > > Mark, I remember you said cusparse solve is
> >> slow
> >> > and you would
> >> > > > rather do it on CPU. Is it right?
> >> > > >
> >> > > > --Junchao Zhang
> >> > > >
> >> > > >
> >> > > > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via
> >> petsc-users
> >> > > > <petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>
> >> > <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>> <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>
> >> > <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>>>
> >> > > <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>
> >> > <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>> <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>
> >> > <mailto:petsc-users at mcs.anl.gov
> >> <mailto:petsc-users at mcs.anl.gov>>>>>
> >> > > wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > Currently, it is possible to use mumps
> >> solver in
> >> > PETSC with
> >> > > > -mat_mumps_use_omp_threads option, so that
> >> > multiple MPI
> >> > > processes will
> >> > > > transfer the matrix and rhs data to the
> master
> >> > rank, and then
> >> > > master
> >> > > > rank will call mumps with OpenMP to solve
> >> the matrix.
> >> > > >
> >> > > > I wonder if someone can develop similar
> >> option for
> >> > cusparse
> >> > > solver.
> >> > > > Right now, this solver does not work with
> >> > mpiaijcusparse. I
> >> > > think a
> >> > > > possible workaround is to transfer all the
> >> matrix
> >> > data to one MPI
> >> > > > process, and then upload the data to GPU to
> >> solve.
> >> > In this
> >> > > way, one can
> >> > > > use cusparse solver for a MPI program.
> >> > > >
> >> > > > Chang
> >> > > > --
> >> > > > Chang Liu
> >> > > > Staff Research Physicist
> >> > > > +1 609 243 3438
> >> > > > cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
> >> > > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
> >> > > > Princeton Plasma Physics Laboratory
> >> > > > 100 Stellarator Rd, Princeton NJ 08540, USA
> >> > > >
> >> > >
> >> > > --
> >> > > Chang Liu
> >> > > Staff Research Physicist
> >> > > +1 609 243 3438
> >> > > cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>> <mailto:cliu at pppl.gov
> >> <mailto:cliu at pppl.gov>
> >> > <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
> >> > > Princeton Plasma Physics Laboratory
> >> > > 100 Stellarator Rd, Princeton NJ 08540, USA
> >> > >
> >> >
> >> > --
> >> > Chang Liu
> >> > Staff Research Physicist
> >> > +1 609 243 3438
> >> > cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
> >> <mailto:cliu at pppl.gov>>
> >> > Princeton Plasma Physics Laboratory
> >> > 100 Stellarator Rd, Princeton NJ 08540, USA
> >> >
> >> -- Chang Liu
> >> Staff Research Physicist
> >> +1 609 243 3438
> >> cliu at pppl.gov <mailto:cliu at pppl.gov>
> >> Princeton Plasma Physics Laboratory
> >> 100 Stellarator Rd, Princeton NJ 08540, USA
> >
> > --
> > Chang Liu
> > Staff Research Physicist
> > +1 609 243 3438
> > cliu at pppl.gov
> > Princeton Plasma Physics Laboratory
> > 100 Stellarator Rd, Princeton NJ 08540, USA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211013/315d62cd/attachment-0001.html>
More information about the petsc-users
mailing list