[petsc-dev] Kokkos/Crusher perforance

Mark Adams mfadams at lbl.gov
Wed Jan 26 06:26:05 CST 2022


The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
I will make a minimum reproducer. start with 2 nodes, one process on each
node.


On Tue, Jan 25, 2022 at 10:19 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   So the MPI is killing you in going from 8 to 64. (The GPU flop rate
> scales almost perfectly, but the overall flop rate is only half of what it
> should be at 64).
>
> On Jan 25, 2022, at 9:24 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> It looks like we have our instrumentation and job configuration in decent
> shape so on to scaling with AMG.
> In using multiple nodes I got errors with table entries not found, which
> can be caused by a buggy MPI, and the problem does go away when I turn GPU
> aware MPI off.
> Jed's analysis, if I have this right, is that at *0.7T* flops we are at
> about 35% of theoretical peal wrt memory bandwidth.
> I run out of memory with the next step in this study (7 levels of
> refinement), with 2M equations per GPU. This seems low to me and we will
> see if we can fix this.
> So this 0.7Tflops is with only 1/4 M equations so 35% is not terrible.
> Here are the solve times with 001, 008 and 064 nodes, and 5 or 6 levels of
> refinement.
>
> out_001_kokkos_Crusher_5_1.txt:KSPSolve              10 1.0 1.2933e+00 1.0
> 4.13e+10 1.1 1.8e+05 8.4e+03 5.8e+02  3 87 86 78 48 100100100100100 248792
>   423857   6840 3.85e+02 6792 3.85e+02 100
> out_001_kokkos_Crusher_6_1.txt:KSPSolve              10 1.0 5.3667e+00 1.0
> 3.89e+11 1.0 2.1e+05 3.3e+04 6.7e+02  2 87 86 79 48 100100100100100 571572
>   *700002*   7920 1.74e+03 7920 1.74e+03 100
> out_008_kokkos_Crusher_5_1.txt:KSPSolve              10 1.0 1.9407e+00 1.0
> 4.94e+10 1.1 3.5e+06 6.2e+03 6.7e+02  5 87 86 79 47 100100100100100 1581096
>   3034723   7920 6.88e+02 7920 6.88e+02 100
> out_008_kokkos_Crusher_6_1.txt:KSPSolve              10 1.0 7.4478e+00 1.0
> 4.49e+11 1.0 4.1e+06 2.3e+04 7.6e+02  2 88 87 80 49 100100100100100 3798162
>   5557106   9367 3.02e+03 9359 3.02e+03 100
> out_064_kokkos_Crusher_5_1.txt:KSPSolve              10 1.0 2.4551e+00 1.0
> 5.40e+10 1.1 4.2e+07 5.4e+03 7.3e+02  5 88 87 80 47 100100100100100
> 11065887   23792978   8684 8.90e+02 8683 8.90e+02 100
> out_064_kokkos_Crusher_6_1.txt:KSPSolve              10 1.0 1.1335e+01 1.0
> 5.38e+11 1.0 5.4e+07 2.0e+04 9.1e+02  4 88 88 82 49 100100100100100
> 24130606   43326249   11249 4.26e+03 11249 4.26e+03 100
>
> On Tue, Jan 25, 2022 at 1:49 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>> Note that Mark's logs have been switching back and forth between
>>> -use_gpu_aware_mpi and changing number of ranks -- we won't have that
>>> information if we do manual timing hacks. This is going to be a routine
>>> thing we'll need on the mailing list and we need the provenance to go with
>>> it.
>>>
>>
>> GPU aware MPI crashes sometimes so to be safe, while debugging, I had it
>> off. It works fine here so it has been on in the last tests.
>> Here is a comparison.
>>
>>
> <tt.tar>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220126/19c1fcc4/attachment.html>


More information about the petsc-dev mailing list