[petsc-users] Poor weak scaling when solving successive linearsystems

Tue Jun 12 15:07:46 CDT 2018

Hello, Michael,
  Sorry for the delay. I am actively doing experiments with your example
code. I tested it on a cluster with 36 cores/node. To distribute MPI ranks
evenly among nodes, I used 216  and 1728 ranks instead of 125, 1000.  So
far I have these findings:
 1) It is not a strict weak scaling test since with 1728 ranks it needs
more KPS iterations, and more calls to MatSOR etc functions.
 2) If I use half cores per node but double the nodes (keep MPI ranks the
same), the performance is 60~70% better. It implies memory bandwidth plays
an important role in performance.
 3) I find you define the outermost two layers of nodes of the grid as
boundary. Boundary processors have less nonzeros than interior processors.
It is a source of load imbalance. At coarser grids, it gets worse. But I
need to confirm this caused the poor scaling and big vecscatter delays in
the experiment.

 Thanks.

--Junchao Zhang

On Tue, Jun 12, 2018 at 12:42 AM, Michael Becker <
michael.becker at physik.uni-giessen.de> wrote:

> Hello,
>
> any new insights yet?
>
> Michael
>
>
>
> Am 04.06.2018 um 21:56 schrieb Junchao Zhang:
>
> Miachael,  I can compile and run you test.  I am now profiling it. Thanks.
>
> --Junchao Zhang
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180612/b45afa7c/attachment.html>