[petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Chang Liu cliu at pppl.gov
Wed Oct 13 23:00:57 CDT 2021


Hi Junchao,

Yes that is what I want.

Chang

On 10/13/21 11:42 PM, Junchao Zhang wrote:
> 
> 
> 
> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith <bsmith at petsc.dev 
> <mailto:bsmith at petsc.dev>> wrote:
> 
> 
>        Junchao,
> 
>           If I understand correctly Chang is using the block Jacobi
>     method with a single block for a number of MPI ranks and a direct
>     solver for each block so it uses PCSetUp_BJacobi_Multiproc() which
>     is code Hong Zhang wrote a number of years ago for CPUs. For their
>     particular problems this preconditioner works well, but using an
>     iterative solver on the blocks does not work well.
> 
>           If we had complete MPI-GPU direct solvers he could just use
>     the current code with MPIAIJCUSPARSE on each block but since we do
>     not he would like to use a single GPU for each block, this means
>     that diagonal blocks of  the global parallel MPI matrix needs to be
>     sent to a subset of the GPUs (one GPU per block, which has multiple
>     MPI ranks associated with the blocks). Similarly for the triangular
>     solves the blocks of the right hand side needs to be shipped to the
>     appropriate GPU and the resulting solution shipped back to the
>     multiple GPUs. So Chang is absolutely correct, this is somewhat like
>     your code for MUMPS with OpenMP. 
> 
> OK, I now understand the background..
> 
>     One could use PCSetUp_BJacobi_Multiproc() and get the blocks on the
>     MPI ranks and then shrink each block down to a single GPU but this
>     would be pretty inefficient, ideally one would go directly from the
>     big MPI matrix on all the GPUs to the sub matrices on the subset of
>     GPUs. But this may be a large coding project.
> 
> I don't understand these sentences. Why do you say "shrink"? In my mind, 
> we just need to move each block (submatrix) living over multiple MPI 
> ranks to one of them and solve directly there.  In other words, we keep 
> blocks' size, no shrinking or expanding.
> As mentioned before, cusparse does not provide LU factorization. So the 
> LU factorization would be done on CPU, and the solve be done on GPU. I 
> assume Chang wants to gain from the (potential) faster solve (instead of 
> factorization) on GPU.
> 
> 
>        Barry
> 
>     Since the matrices being factored and solved directly are relatively
>     large it is possible that the cusparse code could be reasonably
>     efficient (they are not the tiny problems one gets at the coarse
>     level of multigrid). Of course, this is speculation, I don't
>     actually know how much better the cusparse code would be on the
>     direct solver than a good CPU direct sparse solver.
> 
>      > On Oct 13, 2021, at 9:32 PM, Chang Liu <cliu at pppl.gov
>     <mailto:cliu at pppl.gov>> wrote:
>      >
>      > Sorry I am not familiar with the details either. Can you please
>     check the code in MatMumpsGatherNonzerosOnMaster in mumps.c?
>      >
>      > Chang
>      >
>      > On 10/13/21 9:24 PM, Junchao Zhang wrote:
>      >> Hi Chang,
>      >>   I did the work in mumps. It is easy for me to understand
>     gathering matrix rows to one process.
>      >>   But how to gather blocks (submatrices) to form a large block? 
>     Can you draw a picture of that?
>      >>   Thanks
>      >> --Junchao Zhang
>      >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu via petsc-users
>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>     <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>
>     wrote:
>      >>    Hi Barry,
>      >>    I think mumps solver in petsc does support that. You can
>     check the
>      >>    documentation on "-mat_mumps_use_omp_threads" at
>      >>
>     https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>     <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>      >>   
>     <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>     <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
>      >>    and the code enclosed by #if
>     defined(PETSC_HAVE_OPENMP_SUPPORT) in
>      >>    functions MatMumpsSetUpDistRHSInfo and
>      >>    MatMumpsGatherNonzerosOnMaster in
>      >>    mumps.c
>      >>    1. I understand it is ideal to do one MPI rank per GPU.
>     However, I am
>      >>    working on an existing code that was developed based on MPI
>     and the the
>      >>    # of mpi ranks is typically equal to # of cpu cores. We don't
>     want to
>      >>    change the whole structure of the code.
>      >>    2. What you have suggested has been coded in mumps.c. See
>     function
>      >>    MatMumpsSetUpDistRHSInfo.
>      >>    Regards,
>      >>    Chang
>      >>    On 10/13/21 7:53 PM, Barry Smith wrote:
>      >>     >
>      >>     >
>      >>     >> On Oct 13, 2021, at 3:50 PM, Chang Liu <cliu at pppl.gov
>     <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> wrote:
>      >>     >>
>      >>     >> Hi Barry,
>      >>     >>
>      >>     >> That is exactly what I want.
>      >>     >>
>      >>     >> Back to my original question, I am looking for an approach to
>      >>    transfer
>      >>     >> matrix
>      >>     >> data from many MPI processes to "master" MPI
>      >>     >> processes, each of which taking care of one GPU, and then
>     upload
>      >>    the data to GPU to
>      >>     >> solve.
>      >>     >> One can just grab some codes from mumps.c to
>     aijcusparse.cu <http://aijcusparse.cu>
>      >>    <http://aijcusparse.cu <http://aijcusparse.cu>>.
>      >>     >
>      >>     >    mumps.c doesn't actually do that. It never needs to
>     copy the
>      >>    entire matrix to a single MPI rank.
>      >>     >
>      >>     >    It would be possible to write such a code that you
>     suggest but
>      >>    it is not clear that it makes sense
>      >>     >
>      >>     > 1)  For normal PETSc GPU usage there is one GPU per MPI
>     rank, so
>      >>    while your one GPU per big domain is solving its systems the
>     other
>      >>    GPUs (with the other MPI ranks that share that domain) are doing
>      >>    nothing.
>      >>     >
>      >>     > 2) For each triangular solve you would have to gather the
>     right
>      >>    hand side from the multiple ranks to the single GPU to pass it to
>      >>    the GPU solver and then scatter the resulting solution back
>     to all
>      >>    of its subdomain ranks.
>      >>     >
>      >>     >    What I was suggesting was assign an entire subdomain to a
>      >>    single MPI rank, thus it does everything on one GPU and can
>     use the
>      >>    GPU solver directly. If all the major computations of a subdomain
>      >>    can fit and be done on a single GPU then you would be
>     utilizing all
>      >>    the GPUs you are using effectively.
>      >>     >
>      >>     >    Barry
>      >>     >
>      >>     >
>      >>     >
>      >>     >>
>      >>     >> Chang
>      >>     >>
>      >>     >> On 10/13/21 1:53 PM, Barry Smith wrote:
>      >>     >>>    Chang,
>      >>     >>>      You are correct there is no MPI + GPU direct
>     solvers that
>      >>    currently do the triangular solves with MPI + GPU parallelism
>     that I
>      >>    am aware of. You are limited that individual triangular solves be
>      >>    done on a single GPU. I can only suggest making each subdomain as
>      >>    big as possible to utilize each GPU as much as possible for the
>      >>    direct triangular solves.
>      >>     >>>     Barry
>      >>     >>>> On Oct 13, 2021, at 12:16 PM, Chang Liu via petsc-users
>      >>    <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>     <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>>
>     wrote:
>      >>     >>>>
>      >>     >>>> Hi Mark,
>      >>     >>>>
>      >>     >>>> '-mat_type aijcusparse' works with mpiaijcusparse with
>     other
>      >>    solvers, but with -pc_factor_mat_solver_type cusparse, it
>     will give
>      >>    an error.
>      >>     >>>>
>      >>     >>>> Yes what I want is to have mumps or superlu to do the
>      >>    factorization, and then do the rest, including GMRES solver,
>     on gpu.
>      >>    Is that possible?
>      >>     >>>>
>      >>     >>>> I have tried to use aijcusparse with superlu_dist, it
>     runs but
>      >>    the iterative solver is still running on CPUs. I have
>     contacted the
>      >>    superlu group and they confirmed that is the case right now.
>     But if
>      >>    I set -pc_factor_mat_solver_type cusparse, it seems that the
>      >>    iterative solver is running on GPU.
>      >>     >>>>
>      >>     >>>> Chang
>      >>     >>>>
>      >>     >>>> On 10/13/21 12:03 PM, Mark Adams wrote:
>      >>     >>>>> On Wed, Oct 13, 2021 at 11:10 AM Chang Liu
>     <cliu at pppl.gov <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>> wrote:
>      >>     >>>>>     Thank you Junchao for explaining this. I guess in
>     my case
>      >>    the code is
>      >>     >>>>>     just calling a seq solver like superlu to do
>      >>    factorization on GPUs.
>      >>     >>>>>     My idea is that I want to have a traditional MPI
>     code to
>      >>    utilize GPUs
>      >>     >>>>>     with cusparse. Right now cusparse does not support
>     mpiaij
>      >>    matrix, Sure it does: '-mat_type aijcusparse' will give you an
>      >>    mpiaijcusparse matrix with > 1 processes.
>      >>     >>>>> (-mat_type mpiaijcusparse might also work with >1 proc).
>      >>     >>>>> However, I see in grepping the repo that all the mumps and
>      >>    superlu tests use aij or sell matrix type.
>      >>     >>>>> MUMPS and SuperLU provide their own solves, I assume
>     .... but
>      >>    you might want to do other matrix operations on the GPU. Is
>     that the
>      >>    issue?
>      >>     >>>>> Did you try -mat_type aijcusparse with MUMPS and/or
>     SuperLU
>      >>    have a problem? (no test with it so it probably does not work)
>      >>     >>>>> Thanks,
>      >>     >>>>> Mark
>      >>     >>>>>     so I
>      >>     >>>>>     want the code to have a mpiaij matrix when adding
>     all the
>      >>    matrix terms,
>      >>     >>>>>     and then transform the matrix to seqaij when doing the
>      >>    factorization
>      >>     >>>>>     and
>      >>     >>>>>     solve. This involves sending the data to the master
>      >>    process, and I
>      >>     >>>>>     think
>      >>     >>>>>     the petsc mumps solver have something similar already.
>      >>     >>>>>     Chang
>      >>     >>>>>     On 10/13/21 10:18 AM, Junchao Zhang wrote:
>      >>     >>>>>      >
>      >>     >>>>>      >
>      >>     >>>>>      >
>      >>     >>>>>      > On Tue, Oct 12, 2021 at 1:07 PM Mark Adams
>      >>    <mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>
>      >>     >>>>>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>
>      >>     >>>>>      > <mailto:mfadams at lbl.gov
>     <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov
>     <mailto:mfadams at lbl.gov>>
>      >>    <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>
>     <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>>>> wrote:
>      >>     >>>>>      >
>      >>     >>>>>      >
>      >>     >>>>>      >
>      >>     >>>>>      >     On Tue, Oct 12, 2021 at 1:45 PM Chang Liu
>      >>    <cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>      >     <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>> wrote:
>      >>     >>>>>      >
>      >>     >>>>>      >         Hi Mark,
>      >>     >>>>>      >
>      >>     >>>>>      >         The option I use is like
>      >>     >>>>>      >
>      >>     >>>>>      >         -pc_type bjacobi -pc_bjacobi_blocks 16
>      >>    -ksp_type fgmres
>      >>     >>>>>     -mat_type
>      >>     >>>>>      >         aijcusparse *-sub_pc_factor_mat_solver_type
>      >>    cusparse
>      >>     >>>>>     *-sub_ksp_type
>      >>     >>>>>      >         preonly *-sub_pc_type lu* -ksp_max_it 2000
>      >>    -ksp_rtol 1.e-300
>      >>     >>>>>      >         -ksp_atol 1.e-300
>      >>     >>>>>      >
>      >>     >>>>>      >
>      >>     >>>>>      >     Note, If you use -log_view the last column
>     (rows
>      >>    are the
>      >>     >>>>>     method like
>      >>     >>>>>      >     MatFactorNumeric) has the percent of work
>     in the GPU.
>      >>     >>>>>      >
>      >>     >>>>>      >     Junchao: *This* implies that we have a
>     cuSparse LU
>      >>     >>>>>     factorization. Is
>      >>     >>>>>      >     that correct? (I don't think we do)
>      >>     >>>>>      >
>      >>     >>>>>      > No, we don't have cuSparse LU factorization. 
>     If you check
>      >>     >>>>>      > MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
>     find it
>      >>    calls
>      >>     >>>>>      > MatLUFactorSymbolic_SeqAIJ() instead.
>      >>     >>>>>      > So I don't understand Chang's idea. Do you want to
>      >>    make bigger
>      >>     >>>>>     blocks?
>      >>     >>>>>      >
>      >>     >>>>>      >
>      >>     >>>>>      >         I think this one do both factorization and
>      >>    solve on gpu.
>      >>     >>>>>      >
>      >>     >>>>>      >         You can check the
>     runex72_aijcusparse.sh file
>      >>    in petsc
>      >>     >>>>>     install
>      >>     >>>>>      >         directory, and try it your self (this
>     is only lu
>      >>     >>>>>     factorization
>      >>     >>>>>      >         without
>      >>     >>>>>      >         iterative solve).
>      >>     >>>>>      >
>      >>     >>>>>      >         Chang
>      >>     >>>>>      >
>      >>     >>>>>      >         On 10/12/21 1:17 PM, Mark Adams wrote:
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >
>      >>     >>>>>      >          > On Tue, Oct 12, 2021 at 11:19 AM
>     Chang Liu
>      >>     >>>>>     <cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>      >>     >>>>>      >          > <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>> wrote:
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >     Hi Junchao,
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >     No I only needs it to be transferred
>      >>    within a
>      >>     >>>>>     node. I use
>      >>     >>>>>      >         block-Jacobi
>      >>     >>>>>      >          >     method and GMRES to solve the sparse
>      >>    matrix, so each
>      >>     >>>>>      >         direct solver will
>      >>     >>>>>      >          >     take care of a sub-block of the
>     whole
>      >>    matrix. In this
>      >>     >>>>>      >         way, I can use
>      >>     >>>>>      >          >     one
>      >>     >>>>>      >          >     GPU to solve one sub-block, which is
>      >>    stored within
>      >>     >>>>>     one node.
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >     It was stated in the
>     documentation that
>      >>    cusparse
>      >>     >>>>>     solver
>      >>     >>>>>      >         is slow.
>      >>     >>>>>      >          >     However, in my test using
>     ex72.c, the
>      >>    cusparse
>      >>     >>>>>     solver is
>      >>     >>>>>      >         faster than
>      >>     >>>>>      >          >     mumps or superlu_dist on CPUs.
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >
>      >>     >>>>>      >          > Are we talking about the
>     factorization, the
>      >>    solve, or
>      >>     >>>>>     both?
>      >>     >>>>>      >          >
>      >>     >>>>>      >          > We do not have an interface to
>     cuSparse's LU
>      >>     >>>>>     factorization (I
>      >>     >>>>>      >         just
>      >>     >>>>>      >          > learned that it exists a few weeks ago).
>      >>     >>>>>      >          > Perhaps your fast "cusparse solver" is
>      >>    '-pc_type lu
>      >>     >>>>>     -mat_type
>      >>     >>>>>      >          > aijcusparse' ? This would be the CPU
>      >>    factorization,
>      >>     >>>>>     which is the
>      >>     >>>>>      >          > dominant cost.
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >     Chang
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >     On 10/12/21 10:24 AM, Junchao
>     Zhang wrote:
>      >>     >>>>>      >          >      > Hi, Chang,
>      >>     >>>>>      >          >      >     For the mumps solver, we
>     usually
>      >>    transfers
>      >>     >>>>>     matrix
>      >>     >>>>>      >         and vector
>      >>     >>>>>      >          >     data
>      >>     >>>>>      >          >      > within a compute node.  For
>     the idea you
>      >>     >>>>>     propose, it
>      >>     >>>>>      >         looks like
>      >>     >>>>>      >          >     we need
>      >>     >>>>>      >          >      > to gather data within
>      >>    MPI_COMM_WORLD, right?
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      >     Mark, I remember you said
>      >>    cusparse solve is
>      >>     >>>>>     slow
>      >>     >>>>>      >         and you would
>      >>     >>>>>      >          >      > rather do it on CPU. Is it right?
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      > --Junchao Zhang
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      > On Mon, Oct 11, 2021 at 10:25 PM
>      >>    Chang Liu via
>      >>     >>>>>     petsc-users
>      >>     >>>>>      >          >      > <petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>
>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>> <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>
>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>>>
>      >>     >>>>>      >          >     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>
>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>> <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>
>      >>     >>>>>      >         <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>      >>     >>>>>     <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>
>      >>    <mailto:petsc-users at mcs.anl.gov
>     <mailto:petsc-users at mcs.anl.gov>>>>>>>
>      >>     >>>>>      >          >     wrote:
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      >     Hi,
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      >     Currently, it is possible
>     to use
>      >>    mumps
>      >>     >>>>>     solver in
>      >>     >>>>>      >         PETSC with
>      >>     >>>>>      >          >      >     -mat_mumps_use_omp_threads
>      >>    option, so that
>      >>     >>>>>      >         multiple MPI
>      >>     >>>>>      >          >     processes will
>      >>     >>>>>      >          >      >     transfer the matrix and
>     rhs data
>      >>    to the master
>      >>     >>>>>      >         rank, and then
>      >>     >>>>>      >          >     master
>      >>     >>>>>      >          >      >     rank will call mumps with
>     OpenMP
>      >>    to solve
>      >>     >>>>>     the matrix.
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      >     I wonder if someone can
>     develop
>      >>    similar
>      >>     >>>>>     option for
>      >>     >>>>>      >         cusparse
>      >>     >>>>>      >          >     solver.
>      >>     >>>>>      >          >      >     Right now, this solver
>     does not
>      >>    work with
>      >>     >>>>>      >         mpiaijcusparse. I
>      >>     >>>>>      >          >     think a
>      >>     >>>>>      >          >      >     possible workaround is to
>      >>    transfer all the
>      >>     >>>>>     matrix
>      >>     >>>>>      >         data to one MPI
>      >>     >>>>>      >          >      >     process, and then upload the
>      >>    data to GPU to
>      >>     >>>>>     solve.
>      >>     >>>>>      >         In this
>      >>     >>>>>      >          >     way, one can
>      >>     >>>>>      >          >      >     use cusparse solver for a MPI
>      >>    program.
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >      >     Chang
>      >>     >>>>>      >          >      >     --
>      >>     >>>>>      >          >      >     Chang Liu
>      >>     >>>>>      >          >      >     Staff Research Physicist
>      >>     >>>>>      >          >      >     +1 609 243 3438
>      >>     >>>>>      >          >      > cliu at pppl.gov
>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>      >>     >>>>>      >          >     <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>>
>      >>     >>>>>      >          >      >     Princeton Plasma Physics
>     Laboratory
>      >>     >>>>>      >          >      >     100 Stellarator Rd,
>     Princeton NJ
>      >>    08540, USA
>      >>     >>>>>      >          >      >
>      >>     >>>>>      >          >
>      >>     >>>>>      >          >     --
>      >>     >>>>>      >          >     Chang Liu
>      >>     >>>>>      >          >     Staff Research Physicist
>      >>     >>>>>      >          >     +1 609 243 3438
>      >>     >>>>>      >          > cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>      >         <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>>
>      >>     >>>>>      >          >     Princeton Plasma Physics Laboratory
>      >>     >>>>>      >          >     100 Stellarator Rd, Princeton NJ
>     08540, USA
>      >>     >>>>>      >          >
>      >>     >>>>>      >
>      >>     >>>>>      >         --
>      >>     >>>>>      >         Chang Liu
>      >>     >>>>>      >         Staff Research Physicist
>      >>     >>>>>      >         +1 609 243 3438
>      >>     >>>>>      > cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>     >>>>>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>>
>      >>     >>>>>      >         Princeton Plasma Physics Laboratory
>      >>     >>>>>      >         100 Stellarator Rd, Princeton NJ 08540, USA
>      >>     >>>>>      >
>      >>     >>>>>     --     Chang Liu
>      >>     >>>>>     Staff Research Physicist
>      >>     >>>>>     +1 609 243 3438
>      >>     >>>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>
>      >>    <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>>
>      >>     >>>>>     Princeton Plasma Physics Laboratory
>      >>     >>>>>     100 Stellarator Rd, Princeton NJ 08540, USA
>      >>     >>>>
>      >>     >>>> --
>      >>     >>>> Chang Liu
>      >>     >>>> Staff Research Physicist
>      >>     >>>> +1 609 243 3438
>      >>     >>>> cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>     >>>> Princeton Plasma Physics Laboratory
>      >>     >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>      >>     >>
>      >>     >> --
>      >>     >> Chang Liu
>      >>     >> Staff Research Physicist
>      >>     >> +1 609 243 3438
>      >>     >> cliu at pppl.gov <mailto:cliu at pppl.gov>
>     <mailto:cliu at pppl.gov <mailto:cliu at pppl.gov>>
>      >>     >> Princeton Plasma Physics Laboratory
>      >>     >> 100 Stellarator Rd, Princeton NJ 08540, USA
>      >>     >
>      >>    --     Chang Liu
>      >>    Staff Research Physicist
>      >>    +1 609 243 3438
>      >> cliu at pppl.gov <mailto:cliu at pppl.gov> <mailto:cliu at pppl.gov
>     <mailto:cliu at pppl.gov>>
>      >>    Princeton Plasma Physics Laboratory
>      >>    100 Stellarator Rd, Princeton NJ 08540, USA
>      >
>      > --
>      > Chang Liu
>      > Staff Research Physicist
>      > +1 609 243 3438
>      > cliu at pppl.gov <mailto:cliu at pppl.gov>
>      > Princeton Plasma Physics Laboratory
>      > 100 Stellarator Rd, Princeton NJ 08540, USA
> 

-- 
Chang Liu
Staff Research Physicist
+1 609 243 3438
cliu at pppl.gov
Princeton Plasma Physics Laboratory
100 Stellarator Rd, Princeton NJ 08540, USA


More information about the petsc-users mailing list