[petsc-dev] [EXTERNAL] Re: building on Fugaku

Mark Adams mfadams at lbl.gov
Wed Apr 21 09:17:46 CDT 2021


On Tue, Apr 20, 2021 at 9:06 PM Sreepathi, Sarat <sarat at ornl.gov> wrote:

> Already tried those but it didn't help. I have been trying to experiment
> with 48x1, 24x2 etc. and performance degraded for the climate workload.
>

I have problems even using all 48 cores on both my Kokkos Landau code and
KK matrix-vector products (basically) in algebraic multigrid (AMG).

For AMG using 8 (threads) x 4 (MPI) was best and thread speedup was
moderate. I don't know how well KK vectorizes but in principle they should
be able to make that work (they can write any code they want in KK).

For Landau,  I get great thread speedup, This code is MPI serial. I get the
same throughput with 32x1, 16x2, 8x4 and 4x8. It looks like I am not
getting any vectorization.
With a large (10 species) test that I use as my test case, it runs very
slow when I use all 48 cores in any configuration. With 2 species it does
not die, just not great, but I have not looked at this in any detail.

Let us know if you find anything.

Thanks,
Mark

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210421/b7630b66/attachment.html>


More information about the petsc-dev mailing list