[petsc-dev] Kokkos/Crusher perforance
Barry Smith
bsmith at petsc.dev
Tue Jan 25 21:19:51 CST 2022
So the MPI is killing you in going from 8 to 64. (The GPU flop rate scales almost perfectly, but the overall flop rate is only half of what it should be at 64).
> On Jan 25, 2022, at 9:24 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> It looks like we have our instrumentation and job configuration in decent shape so on to scaling with AMG.
> In using multiple nodes I got errors with table entries not found, which can be caused by a buggy MPI, and the problem does go away when I turn GPU aware MPI off.
> Jed's analysis, if I have this right, is that at 0.7T flops we are at about 35% of theoretical peal wrt memory bandwidth.
> I run out of memory with the next step in this study (7 levels of refinement), with 2M equations per GPU. This seems low to me and we will see if we can fix this.
> So this 0.7Tflops is with only 1/4 M equations so 35% is not terrible.
> Here are the solve times with 001, 008 and 064 nodes, and 5 or 6 levels of refinement.
>
> out_001_kokkos_Crusher_5_1.txt:KSPSolve 10 1.0 1.2933e+00 1.0 4.13e+10 1.1 1.8e+05 8.4e+03 5.8e+02 3 87 86 78 48 100100100100100 248792 423857 6840 3.85e+02 6792 3.85e+02 100
> out_001_kokkos_Crusher_6_1.txt:KSPSolve 10 1.0 5.3667e+00 1.0 3.89e+11 1.0 2.1e+05 3.3e+04 6.7e+02 2 87 86 79 48 100100100100100 571572 700002 7920 1.74e+03 7920 1.74e+03 100
> out_008_kokkos_Crusher_5_1.txt:KSPSolve 10 1.0 1.9407e+00 1.0 4.94e+10 1.1 3.5e+06 6.2e+03 6.7e+02 5 87 86 79 47 100100100100100 1581096 3034723 7920 6.88e+02 7920 6.88e+02 100
> out_008_kokkos_Crusher_6_1.txt:KSPSolve 10 1.0 7.4478e+00 1.0 4.49e+11 1.0 4.1e+06 2.3e+04 7.6e+02 2 88 87 80 49 100100100100100 3798162 5557106 9367 3.02e+03 9359 3.02e+03 100
> out_064_kokkos_Crusher_5_1.txt:KSPSolve 10 1.0 2.4551e+00 1.0 5.40e+10 1.1 4.2e+07 5.4e+03 7.3e+02 5 88 87 80 47 100100100100100 11065887 23792978 8684 8.90e+02 8683 8.90e+02 100
> out_064_kokkos_Crusher_6_1.txt:KSPSolve 10 1.0 1.1335e+01 1.0 5.38e+11 1.0 5.4e+07 2.0e+04 9.1e+02 4 88 88 82 49 100100100100100 24130606 43326249 11249 4.26e+03 11249 4.26e+03 100
>
> On Tue, Jan 25, 2022 at 1:49 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>
> Note that Mark's logs have been switching back and forth between -use_gpu_aware_mpi and changing number of ranks -- we won't have that information if we do manual timing hacks. This is going to be a routine thing we'll need on the mailing list and we need the provenance to go with it.
>
> GPU aware MPI crashes sometimes so to be safe, while debugging, I had it off. It works fine here so it has been on in the last tests.
> Here is a comparison.
>
> <tt.tar>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220125/887f9c8e/attachment.html>
More information about the petsc-dev
mailing list