[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver
Junchao Zhang
junchao.zhang at gmail.com
Wed Oct 13 09:18:24 CDT 2021
On Tue, Oct 12, 2021 at 1:07 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>
> On Tue, Oct 12, 2021 at 1:45 PM Chang Liu <cliu at pppl.gov> wrote:
>
>> Hi Mark,
>>
>> The option I use is like
>>
>> -pc_type bjacobi -pc_bjacobi_blocks 16 -ksp_type fgmres -mat_type
>> aijcusparse *-sub_pc_factor_mat_solver_type cusparse *-sub_ksp_type
>> preonly *-sub_pc_type lu* -ksp_max_it 2000 -ksp_rtol 1.e-300 -ksp_atol
>> 1.e-300
>>
>>
> Note, If you use -log_view the last column (rows are the method like
> MatFactorNumeric) has the percent of work in the GPU.
>
> Junchao: *This* implies that we have a cuSparse LU factorization. Is
> that correct? (I don't think we do)
>
No, we don't have cuSparse LU factorization. If you check
MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will find it calls
MatLUFactorSymbolic_SeqAIJ()
instead.
So I don't understand Chang's idea. Do you want to make bigger blocks?
>
> I think this one do both factorization and solve on gpu.
>>
>> You can check the runex72_aijcusparse.sh file in petsc install
>> directory, and try it your self (this is only lu factorization without
>> iterative solve).
>>
>> Chang
>>
>> On 10/12/21 1:17 PM, Mark Adams wrote:
>> >
>> >
>> > On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <cliu at pppl.gov
>> > <mailto:cliu at pppl.gov>> wrote:
>> >
>> > Hi Junchao,
>> >
>> > No I only needs it to be transferred within a node. I use
>> block-Jacobi
>> > method and GMRES to solve the sparse matrix, so each direct solver
>> will
>> > take care of a sub-block of the whole matrix. In this way, I can use
>> > one
>> > GPU to solve one sub-block, which is stored within one node.
>> >
>> > It was stated in the documentation that cusparse solver is slow.
>> > However, in my test using ex72.c, the cusparse solver is faster than
>> > mumps or superlu_dist on CPUs.
>> >
>> >
>> > Are we talking about the factorization, the solve, or both?
>> >
>> > We do not have an interface to cuSparse's LU factorization (I just
>> > learned that it exists a few weeks ago).
>> > Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type
>> > aijcusparse' ? This would be the CPU factorization, which is the
>> > dominant cost.
>> >
>> >
>> > Chang
>> >
>> > On 10/12/21 10:24 AM, Junchao Zhang wrote:
>> > > Hi, Chang,
>> > > For the mumps solver, we usually transfers matrix and vector
>> > data
>> > > within a compute node. For the idea you propose, it looks like
>> > we need
>> > > to gather data within MPI_COMM_WORLD, right?
>> > >
>> > > Mark, I remember you said cusparse solve is slow and you
>> would
>> > > rather do it on CPU. Is it right?
>> > >
>> > > --Junchao Zhang
>> > >
>> > >
>> > > On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users
>> > > <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>> > <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>
>> > wrote:
>> > >
>> > > Hi,
>> > >
>> > > Currently, it is possible to use mumps solver in PETSC with
>> > > -mat_mumps_use_omp_threads option, so that multiple MPI
>> > processes will
>> > > transfer the matrix and rhs data to the master rank, and then
>> > master
>> > > rank will call mumps with OpenMP to solve the matrix.
>> > >
>> > > I wonder if someone can develop similar option for cusparse
>> > solver.
>> > > Right now, this solver does not work with mpiaijcusparse. I
>> > think a
>> > > possible workaround is to transfer all the matrix data to
>> one MPI
>> > > process, and then upload the data to GPU to solve. In this
>> > way, one can
>> > > use cusparse solver for a MPI program.
>> > >
>> > > Chang
>> > > --
>> > > Chang Liu
>> > > Staff Research Physicist
>> > > +1 609 243 3438
>> > > cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>> > <mailto:cliu at pppl.gov>>
>> > > Princeton Plasma Physics Laboratory
>> > > 100 Stellarator Rd, Princeton NJ 08540, USA
>> > >
>> >
>> > --
>> > Chang Liu
>> > Staff Research Physicist
>> > +1 609 243 3438
>> > cliu at pppl.gov <mailto:cliu at pppl.gov>
>> > Princeton Plasma Physics Laboratory
>> > 100 Stellarator Rd, Princeton NJ 08540, USA
>> >
>>
>> --
>> Chang Liu
>> Staff Research Physicist
>> +1 609 243 3438
>> cliu at pppl.gov
>> Princeton Plasma Physics Laboratory
>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211013/b302d945/attachment.html>
More information about the petsc-users
mailing list