[petsc-dev] Kokkos/Crusher perforance
Barry Smith
bsmith at petsc.dev
Mon Jan 24 10:03:12 CST 2022
Not clear how to interpret, the "gpu" FLOP rate for dot and norm are a good amount higher (exact details of where the log functions are located can affect this) but the over flop rates of them are not much better. Scatter is better without GPU MPI. How much of this is noise, need to see statistics from multiple runs. Certainly not satisfying.
GPU MPI
MatMult 400 1.0 8.4784e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 68 91100100 0 98667 139198 0 0.00e+00 0 0.00e+00 100
KSPSolve 2 1.0 1.2222e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 3 60 61 54 60 100100100100100 75509 122610 0 0.00e+00 0 0.00e+00 100
VecTDot 802 1.0 1.3863e+00 1.3 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 10 3 0 0 67 19186 48762 0 0.00e+00 0 0.00e+00 100
VecNorm 402 1.0 9.2933e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 6 1 0 0 33 14345 127332 0 0.00e+00 0 0.00e+00 100
VecAXPY 800 1.0 8.2405e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 7 3 0 0 0 32195 62486 0 0.00e+00 0 0.00e+00 100
VecAYPX 398 1.0 8.6891e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 6 1 0 0 0 15190 19019 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 402 1.0 3.5227e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 18922 39878 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 400 1.0 1.1519e+00 2.1 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0
VecScatterEnd 400 1.0 1.5642e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 400 1.0 8.1754e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 65 91100100 102324 133771 800 4.74e+02 800 4.74e+02 100
KSPSolve 2 1.0 1.2605e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 2 60 61 54 60 100100100100100 73214 113908 800 4.74e+02 800 4.74e+02 100
VecTDot 802 1.0 2.0607e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 15 3 0 0 67 12907 25655 0 0.00e+00 0 0.00e+00 100
VecNorm 402 1.0 9.5100e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 6 1 0 0 33 14018 96704 0 0.00e+00 0 0.00e+00 100
VecAXPY 800 1.0 7.9864e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 6 3 0 0 0 33219 65843 0 0.00e+00 0 0.00e+00 100
VecAYPX 398 1.0 8.0719e-01 1.7 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 5 1 0 0 0 16352 21253 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 402 1.0 3.7318e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 17862 38464 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 400 1.0 1.4075e+00 1.8 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 9 0100100 0 0 0 0 0.00e+00 800 4.74e+02 0
VecScatterEnd 400 1.0 6.3044e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 800 4.74e+02 0 0.00e+00 0
> On Jan 24, 2022, at 10:25 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
> Mark,
>
> Can you run both with GPU aware MPI?
>
>
> Perlmuter fails with GPU aware MPI. I think there are know problems with this that are being worked on.
>
> And here is Crusher with GPU aware MPI.
>
> <jac_out_001_kokkos_Crusher_6_1_notpl.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/b20f06d6/attachment.html>
More information about the petsc-dev
mailing list