[petsc-dev] Kokkos/Crusher perforance

Barry Smith bsmith at petsc.dev
Mon Jan 24 10:03:12 CST 2022


Not clear how to interpret, the "gpu" FLOP rate for dot and norm are a good amount higher (exact details of where the log functions are located can affect this) but the over flop rates of them are not much better. Scatter is better without GPU MPI. How much of this is noise, need to see statistics from multiple runs. Certainly not satisfying.

GPU MPI

MatMult              400 1.0 8.4784e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00  2 55 61 54  0  68 91100100  0 98667  139198      0 0.00e+00    0 0.00e+00 100
KSPSolve               2 1.0 1.2222e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03  3 60 61 54 60 100100100100100 75509  122610      0 0.00e+00    0 0.00e+00 100
VecTDot              802 1.0 1.3863e+00 1.3 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02  0  2  0  0 40  10  3  0  0 67 19186   48762      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 9.2933e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   6  1  0  0 33 14345  127332      0 0.00e+00    0 0.00e+00 100
VecAXPY              800 1.0 8.2405e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   7  3  0  0  0 32195   62486      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 8.6891e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   6  1  0  0  0 15190   19019      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 3.5227e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  1  0  0  0 18922   39878      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      400 1.0 1.1519e+00 2.1 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00  0  0 61 54  0   7  0100100  0     0       0      0 0.00e+00    0 0.00e+00  0
VecScatterEnd        400 1.0 1.5642e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0


MatMult              400 1.0 8.1754e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00  2 55 61 54  0  65 91100100   102324  133771    800 4.74e+02  800 4.74e+02 100
KSPSolve               2 1.0 1.2605e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03  2 60 61 54 60 100100100100100 73214  113908    800 4.74e+02  800 4.74e+02 100
VecTDot              802 1.0 2.0607e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02  0  2  0  0 40  15  3  0  0 67 12907   25655      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 9.5100e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   6  1  0  0 33 14018   96704      0 0.00e+00    0 0.00e+00 100
VecAXPY              800 1.0 7.9864e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   6  3  0  0  0 33219   65843      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 8.0719e-01 1.7 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   5  1  0  0  0 16352   21253      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 3.7318e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  1  0  0  0 17862   38464      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      400 1.0 1.4075e+00 1.8 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00  0  0 61 54  0   9  0100100  0     0       0      0 0.00e+00  800 4.74e+02  0
VecScatterEnd        400 1.0 6.3044e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  0  0  0  0     0       0    800 4.74e+02    0 0.00e+00  0


> On Jan 24, 2022, at 10:25 AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
>  
>   Mark,
> 
>      Can you run both with GPU aware MPI?
> 
> 
> Perlmuter fails with GPU aware MPI. I think there are know problems with this that are being worked on.
> 
> And here is Crusher with GPU aware MPI.
>  
> <jac_out_001_kokkos_Crusher_6_1_notpl.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/b20f06d6/attachment.html>


More information about the petsc-dev mailing list