[petsc-dev] Kokkos/Crusher perforance
Mark Adams
mfadams at lbl.gov
Mon Jan 24 12:08:52 CST 2022
On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsmith at petsc.dev> wrote:
>
> Here except for VecNorm the GPU is used effectively in that most of the
> time is time is spent doing real work on the GPU
>
> VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00
> 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 0.00e+00 0
> 0.00e+00 100
>
> Even the dots are very effective, only the VecNorm flop rate over the full
> time is much much lower than the vecdot. Which is somehow due to the use of
> the GPU or CPU MPI in the allreduce?
>
The VecNorm GPU rate is relatively high on Crusher and the CPU rate is
about the same as the other vec ops. I don't know what to make of that.
But Crusher is clearly not crushing it.
Junchao: Perhaps we should ask Kokkos if they have any experience with
Crusher that they can share. They could very well find some low level magic.
>
>
> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
>> Mark, can we compare with Spock?
>>
>
> Looks much better. This puts two processes/GPU because there are only 4.
> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/66192864/attachment.html>
More information about the petsc-dev
mailing list