[petsc-users] Poor weak scaling when solving successivelinearsystems

Junchao Zhang jczhang at mcs.anl.gov
Tue May 29 16:41:46 CDT 2018


The log files have something like "Average time for zero size MPI_Send():
1.84231e-05". It looks you ran on a cluster with a very slow network. A
typical machine should give less than 1/10 of the latency you have. An easy
way to try is just running the code on a machine with a faster network and
see what happens.

Also, how many cores & numa domains does a compute node have? I could not
figure out how you distributed the 125 MPI ranks evenly.

--Junchao Zhang

On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
Michael.Becker at physik.uni-giessen.de> wrote:

> Hello again,
>
> here are the updated log_view files for 125 and 1000 processors. I ran
> both problems twice, the first time with all processors per node allocated
> ("-1.txt"), the second with only half on twice the number of nodes
> ("-2.txt").
>
> On May 24, 2018, at 12:24 AM, Michael Becker <Michael.Becker at physik.uni-giessen.de> <Michael.Becker at physik.uni-giessen.de> wrote:
>
> I noticed that for every individual KSP iteration, six vector objects are created and destroyed (with CG, more with e.g. GMRES).
>
>    Hmm, it is certainly not intended at vectors be created and destroyed within each KSPSolve() could you please point us to the code that makes you think they are being created and destroyed?   We create all the work vectors at KSPSetUp() and destroy them in KSPReset() not during the solve. Not that this would be a measurable distance.
>
>
> I mean this, right in the log_view output:
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> ...
>
> --- Event Stage 1: First Solve
>
> ...
>
> --- Event Stage 2: Remaining Solves
>
> Vector 23904 23904 1295501184 0.
>
> I logged the exact number of KSP iterations over the 999 timesteps and its
> exactly 23904/6 = 3984.
>
> Michael
>
>
>
> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
>
>   Please send the log file for 1000 with cg as the solver.
>
>    You should make a bar chart of each event for the two cases to see which ones are taking more time and which are taking less (we cannot tell with the two logs you sent us since they are for different solvers.)
>
>
>
>
> On May 24, 2018, at 12:24 AM, Michael Becker <Michael.Becker at physik.uni-giessen.de> <Michael.Becker at physik.uni-giessen.de> wrote:
>
> I noticed that for every individual KSP iteration, six vector objects are created and destroyed (with CG, more with e.g. GMRES).
>
>    Hmm, it is certainly not intended at vectors be created and destroyed within each KSPSolve() could you please point us to the code that makes you think they are being created and destroyed?   We create all the work vectors at KSPSetUp() and destroy them in KSPReset() not during the solve. Not that this would be a measurable distance.
>
>
>
>
> This seems kind of wasteful, is this supposed to be like this? Is this even the reason for my problems? Apart from that, everything seems quite normal to me (but I'm not the expert here).
>
>
> Thanks in advance.
>
> Michael
>
>
>
> <log_view_125procs.txt><log_view_1000procs.txt>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180529/51596d63/attachment.html>


More information about the petsc-users mailing list