[petsc-dev] Kokkos/Crusher perforance
Barry Smith
bsmith at petsc.dev
Mon Jan 24 11:44:14 CST 2022
Here except for VecNorm the GPU is used effectively in that most of the time is time is spent doing real work on the GPU
VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 0.00e+00 0 0.00e+00 100
Even the dots are very effective, only the VecNorm flop rate over the full time is much much lower than the vecdot. Which is somehow due to the use of the GPU or CPU MPI in the allreduce?
> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> Mark, can we compare with Spock?
>
> Looks much better. This puts two processes/GPU because there are only 4.
> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/869ab757/attachment.html>
More information about the petsc-dev
mailing list