[petsc-dev] [EXTERNAL] Re: building on Fugaku
Mark Adams
mfadams at lbl.gov
Wed Apr 21 09:17:46 CDT 2021
On Tue, Apr 20, 2021 at 9:06 PM Sreepathi, Sarat <sarat at ornl.gov> wrote:
> Already tried those but it didn't help. I have been trying to experiment
> with 48x1, 24x2 etc. and performance degraded for the climate workload.
>
I have problems even using all 48 cores on both my Kokkos Landau code and
KK matrix-vector products (basically) in algebraic multigrid (AMG).
For AMG using 8 (threads) x 4 (MPI) was best and thread speedup was
moderate. I don't know how well KK vectorizes but in principle they should
be able to make that work (they can write any code they want in KK).
For Landau, I get great thread speedup, This code is MPI serial. I get the
same throughput with 32x1, 16x2, 8x4 and 4x8. It looks like I am not
getting any vectorization.
With a large (10 species) test that I use as my test case, it runs very
slow when I use all 48 cores in any configuration. With 2 species it does
not die, just not great, but I have not looked at this in any detail.
Let us know if you find anything.
Thanks,
Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210421/b7630b66/attachment.html>
More information about the petsc-dev
mailing list