[petsc-dev] Kokkos/Crusher perforance

Barry Smith bsmith at petsc.dev
Mon Jan 24 11:44:14 CST 2022


  Here except for VecNorm the GPU is used effectively in that most of the time is time is spent doing real work on the GPU

VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0 0.00e+00    0 0.00e+00 100

Even the dots are very effective, only the VecNorm flop rate over the full time is much much lower than the vecdot. Which is somehow due to the use of the GPU or CPU MPI in the allreduce?



> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> Mark, can we compare with Spock?
> 
>  Looks much better. This puts two processes/GPU because there are only 4.
> <jac_out_001_kokkos_Spock_6_1_notpl.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/869ab757/attachment.html>


More information about the petsc-dev mailing list