[petsc-users] Poor weak scaling when solving successive linearsystems
Junchao Zhang
jczhang at mcs.anl.gov
Tue Jun 12 15:07:46 CDT 2018
Hello, Michael,
Sorry for the delay. I am actively doing experiments with your example
code. I tested it on a cluster with 36 cores/node. To distribute MPI ranks
evenly among nodes, I used 216 and 1728 ranks instead of 125, 1000. So
far I have these findings:
1) It is not a strict weak scaling test since with 1728 ranks it needs
more KPS iterations, and more calls to MatSOR etc functions.
2) If I use half cores per node but double the nodes (keep MPI ranks the
same), the performance is 60~70% better. It implies memory bandwidth plays
an important role in performance.
3) I find you define the outermost two layers of nodes of the grid as
boundary. Boundary processors have less nonzeros than interior processors.
It is a source of load imbalance. At coarser grids, it gets worse. But I
need to confirm this caused the poor scaling and big vecscatter delays in
the experiment.
Thanks.
--Junchao Zhang
On Tue, Jun 12, 2018 at 12:42 AM, Michael Becker <
michael.becker at physik.uni-giessen.de> wrote:
> Hello,
>
> any new insights yet?
>
> Michael
>
>
>
> Am 04.06.2018 um 21:56 schrieb Junchao Zhang:
>
> Miachael, I can compile and run you test. I am now profiling it. Thanks.
>
> --Junchao Zhang
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180612/b45afa7c/attachment.html>
More information about the petsc-users
mailing list