[petsc-dev] Kokkos/Crusher perforance

Mon Jan 24 13:16:06 CST 2022

On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
>
> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> Mark, I think you can benchmark individual vector operations, and once
>>> we get reasonable profiling results, we can move to solvers etc.
>>>
>>
>> Can you suggest a code to run or are you suggesting making a vector
>> benchmark code?
>>
> Make a vector benchmark code, testing vector operations that would be used
> in your solver.
> Also, we can run MatMult() to see if the profiling result is reasonable.
> Only once we get some solid results on basic operations, it is useful to
> run big codes.
>

So we have to make another throw-away code? Why not just look at the vector
ops in Mark's actual code?

   Matt

>
>>
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>   Here except for VecNorm the GPU is used effectively in that most of
>>>>> the time is time is spent doing real work on the GPU
>>>>>
>>>>> VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00
>>>>> 0.0e+00 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0
>>>>> 0.00e+00    0 0.00e+00 100
>>>>>
>>>>> Even the dots are very effective, only the VecNorm flop rate over the
>>>>> full time is much much lower than the vecdot. Which is somehow due to the
>>>>> use of the GPU or CPU MPI in the allreduce?
>>>>>
>>>>
>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate is
>>>> about the same as the other vec ops. I don't know what to make of that.
>>>>
>>>> But Crusher is clearly not crushing it.
>>>>
>>>> Junchao: Perhaps we should ask Kokkos if they have any experience with
>>>> Crusher that they can share. They could very well find some low level magic.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Mark, can we compare with Spock?
>>>>>>
>>>>>
>>>>>  Looks much better. This puts two processes/GPU because there are only
>>>>> 4.
>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>>>>>
>>>>>
>>>>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/7ff383a2/attachment.html>