[petsc-dev] cuSparse vector performance issue

Mark Adams mfadams at lbl.gov
Sat Sep 25 21:25:16 CDT 2021


On Sat, Sep 25, 2021 at 8:12 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
>
>
> On Sat, Sep 25, 2021 at 4:45 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I am testing my Landau code, which is MPI serial, but with many
>> independent MPI processes driving each GPU, in an MPI parallel harness code
>> (Landau ex2).
>>
>> Vector operations with Kokkos Kernels and cuSparse are about the same (KK
>> is faster) and a bit expensive with one process / GPU. About the same as my
>> Jacobian construction, which is expensive but optimized on the GPU.  (I am
>> using arkimex adaptive TS. I am guessing that it does a lot of vector ops,
>> because there are a lot.)
>>
>> With 14 or 15 processes, all doing the same MPI serial problem, cuSparse
>> is about 2.5x more expensive than KK. KK does degrad by about 15% from the
>> one processor case. So KK is doing fine, but something bad is
>> happening with cuSparse.
>>
> AIJKOKKOS and AIJCUSPARSE have different algorithms? I don't know.  To
> know exactly, the best approach is to consult with Peng at nvidia to profile
> the code.
>

Yea, I could ask Peng if he has any thoughts.

I am also now having a problem with snes/tests/ex13 scaling study (for my
ECP report).
The cuSparse version of GAMG is hanging on an 8 node job with a refinement
of 3. It works on one node with a refinement of 4 and on 8 nodes with a
refinement of 2.
I recently moved from CUDA-10 to CUDA-11 on summit because MPS seems to be
working with CUDA-11 whereas it was not a while ago.
I think I will try going back to CUDA-10 and see if I see anything change.

Thanks,
Mark

>
>
>>
>> Anyone have any thoughts on this?
>>
>> Thanks,
>> Mark
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210925/fdb919f9/attachment.html>


More information about the petsc-dev mailing list