[petsc-dev] Kokkos/Crusher perforance

Mon Jan 24 13:57:13 CST 2022

My name has been called.

Mark, if you're having issues with Crusher, please contact Veronica Vergara
(vergaravg at ornl.gov). You can cc me (justin.chang at amd.com) in those emails

On Mon, Jan 24, 2022 at 1:49 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Jan 24, 2022, at 2:46 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> Yea, CG/Jacobi is as close to a benchmark code as we could want. I could
> run this on one processor to get cleaner numbers.
>
> Is there a designated ECP technical support contact?
>
>
>    Mark, you've forgotten you work for DOE. There isn't a non-ECP
> technical support contact.
>
>    But if this is an AMD machine then maybe contact Matt's student Justin
> Chang?
>
>
>
>
>
> On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   I think you should contact the crusher ECP technical support team and
>> tell them you are getting dismel performance and ask if you should expect
>> better. Don't waste time flogging a dead horse.
>>
>> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>> Mark, I think you can benchmark individual vector operations, and once
>>>>> we get reasonable profiling results, we can move to solvers etc.
>>>>>
>>>>
>>>> Can you suggest a code to run or are you suggesting making a vector
>>>> benchmark code?
>>>>
>>> Make a vector benchmark code, testing vector operations that would be
>>> used in your solver.
>>> Also, we can run MatMult() to see if the profiling result is reasonable.
>>> Only once we get some solid results on basic operations, it is useful to
>>> run big codes.
>>>
>>
>> So we have to make another throw-away code? Why not just look at the
>> vector ops in Mark's actual code?
>>
>>    Matt
>>
>>
>>>
>>>>
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsmith at petsc.dev>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Here except for VecNorm the GPU is used effectively in that most
>>>>>>> of the time is time is spent doing real work on the GPU
>>>>>>>
>>>>>>> VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00
>>>>>>> 0.0e+00 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>>
>>>>>>> Even the dots are very effective, only the VecNorm flop rate over
>>>>>>> the full time is much much lower than the vecdot. Which is somehow due to
>>>>>>> the use of the GPU or CPU MPI in the allreduce?
>>>>>>>
>>>>>>
>>>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate
>>>>>> is about the same as the other vec ops. I don't know what to make of that.
>>>>>>
>>>>>> But Crusher is clearly not crushing it.
>>>>>>
>>>>>> Junchao: Perhaps we should ask Kokkos if they have any experience
>>>>>> with Crusher that they can share. They could very well find some low level
>>>>>> magic.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Mark, can we compare with Spock?
>>>>>>>>
>>>>>>>
>>>>>>>  Looks much better. This puts two processes/GPU because there are
>>>>>>> only 4.
>>>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>>>>>>>
>>>>>>>
>>>>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220124/ca57f3a5/attachment-0001.html>