[petsc-dev] Kokkos/Crusher perforance

Barry Smith bsmith at petsc.dev
Sat Jan 22 18:34:52 CST 2022


  I submit it is actually a good amount of additional work and requires real creativity and very good judgment; it is not a good intro or undergrad project; especially for someone without a huge amount of hands-on experience already. Look who had to do the new SpecHPC multigrid benchmark. The last time I checked Sam was not an undergrad. Senior Scientist, Lawrence Berkeley National Laboratory - ‪‪Cited by 11194‬‬ I definitely do not plan to involve myself in any brand new serious benchmarking studies in my current lifetime, doing one correctly is a massive undertaking IMHO.

> On Jan 22, 2022, at 6:43 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> This isn't so much more or less work, but work in more useful places. Maybe this is a good undergrad or intro project to make a clean workflow for these experiments.
> 
> Barry Smith <bsmith at petsc.dev> writes:
> 
>>  Performance studies are enormously difficult to do well; which is why there are so few good ones out there. And unless you fall into the LINPACK benchmark or hit upon Streams the rewards of doing an excellent job are pretty thin. Even Streams was not properly maintained for many years, you could not just get it and use it out of the box for a variety of purposes (which is why PETSc has its hacked-up ones). I submit a properly performance study is a full-time job and everyone always has those.
>> 
>>> On Jan 22, 2022, at 2:11 PM, Jed Brown <jed at jedbrown.org> wrote:
>>> 
>>> Barry Smith <bsmith at petsc.dev> writes:
>>> 
>>>>> On Jan 22, 2022, at 12:15 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>> Barry, when you did the tech reports, did you make an example to reproduce on other architectures? Like, run this one example (it'll run all the benchmarks across different sizes) and then run this script on the output to make all the figures?
>>>> 
>>>>  It is documented in https://www.overleaf.com/project/5ff8f7aca589b2f7eb81c579    You may need to dig through the submit scripts etc to find out exactly.
>>> 
>>> This runs a ton of small jobs and each job doesn't really preload, but instead of loops in job submission scripts, the loops could be inside the C code and it could directly output tabular data. This would run faster and be easier to submit and analyze.
>>> 
>>> https://gitlab.com/hannah_mairs/summit-performance/-/blob/master/summit-submissions/submit_gpu1.lsf
>>> 
>>> It would hopefully also avoid writing the size range manually over here in the analysis script where it has to match exactly the job submission.
>>> 
>>> https://gitlab.com/hannah_mairs/summit-performance/-/blob/master/python/graphs.py#L8-9
>>> 
>>> 
>>> We'd make our lives a lot easier understanding new machines if we put into the design of performance studies just a fraction of the kind of thought we put into public library interfaces.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220122/d6f647b7/attachment-0001.html>


More information about the petsc-dev mailing list