[petsc-dev] Imbalance in MG

Wed Jun 27 11:24:55 CDT 2018

Mark,
   Can I conclude that with default settings, GAMG is inherently imbalanced
(because of idle processors, biased communincation) and nonscalable (more
idle processors with more cores)?
   Are "-pc_gamg_process_eq_limit 200  -pc_gamg_coarse_eq_limit
200*ncores"  good options that make GAMG load-balanced in sense that there
are at least 200 equations on each processor at the coarsest level ?

--Junchao Zhang

On Tue, Jun 26, 2018 at 7:37 PM, Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Fri, Jun 22, 2018 at 3:26 PM Junchao Zhang <jczhang at mcs.anl.gov> wrote:
>
>> I instrumented PCMGMCycle_Private() to pull out some info about the
>> matrices and VecScatters used in MatSOR_MPIAIJ, MatMultTranspose_MPIAIJ
>> etc at each multigrid level to see how imbalanced they are. In my test, I
>> have a 6 x 6 x 6 = 216 processor grid. Each processor has 30 x 30 x 30 grid
>> points.  The code uses 7-point stencil.  Except for some boundary points,
>> it looks the problem is perfectly balanced.  From the output, I can see
>> processors communicate with more and more neighbors as they enter coarser
>> grids. For example, non-boundary processors first have 6 face-neighbors,
>> then 18 edge-neighbors, then 26 vertex-neighbors, and then even more
>> neighbors. At some level, the grid is only on the first few processors and
>> others are idle.
>>
>
> That is all expected. The 'stencils' get larger on coarser grids and the
> code reduces the number of active processors on coarse grids when there is
> not enough parallelism available.
>
>
>> The communication pattern is also imbalanced. For example, at level 3, I
>> have
>>
>>  2172 Entering MG level 3
>>  ...
>>  2605 doing MatRestrict
>>  2606 MatMultTranspose_MPIAIJ: on rank 0 mat has 59 rows, 284 nonzeros,
>> send 188 to 33 nbrs,recv 10 from 1 nbrs
>>  2607 MatMultTranspose_MPIAIJ: on rank 1 mat has 61 rows, 459 nonzeros,
>> send 237 to 38 nbrs,recv 25 from 2 nbrs
>>  2608 MatMultTranspose_MPIAIJ: on rank 2 mat has 62 rows, 519 nonzeros,
>> send 245 to 28 nbrs,recv 28 from 2 nbrs
>>  2609 MatMultTranspose_MPIAIJ: on rank 3 mat has 62 rows, 521 nonzeros,
>> send 316 to 47 nbrs,recv 15 from 1 nbrs
>>  2610 MatMultTranspose_MPIAIJ: on rank 4 mat has 62 rows, 525 nonzeros,
>> send 411 to 62 nbrs,recv 28 from 2 nbrs
>>  2611 MatMultTranspose_MPIAIJ: on rank 5 mat has 70 rows, 526 nonzeros,
>> send 424 to 49 nbrs,recv 26 from 2 nbrs
>>  2612 MatMultTranspose_MPIAIJ: on rank 6 mat has 63 rows, 503 nonzeros,
>> send 259 to 41 nbrs,recv 28 from 4 nbrs
>>  2613 MatMultTranspose_MPIAIJ: on rank 7 mat has 64 rows, 374 nonzeros,
>> send 349 to 62 nbrs,recv 32 from 4 nbrs
>>  2614 MatMultTranspose_MPIAIJ: on rank 8 mat has 67 rows, 461 nonzeros,
>> send 354 to 51 nbrs,recv 29 from 4 nbrs
>>  2615 MatMultTranspose_MPIAIJ: on rank 9 mat has 67 rows, 462 nonzeros,
>> send 274 to 42 nbrs,recv 31 from 4 nbrs
>>  2616 MatMultTranspose_MPIAIJ: on rank 10 mat has 67 rows, 458 nonzeros,
>> send 359 to 62 nbrs,recv 30 from 4 nbrs
>>  2617 MatMultTranspose_MPIAIJ: on rank 11 mat has 70 rows, 482 nonzeros,
>> send 364 to 51 nbrs,recv 25 from 4 nbrs
>>  2618 MatMultTranspose_MPIAIJ: on rank 12 mat has 61 rows, 469 nonzeros,
>> send 274 to 42 nbrs,recv 29 from 3 nbrs
>>  2619 MatMultTranspose_MPIAIJ: on rank 13 mat has 64 rows, 454 nonzeros,
>> send 359 to 62 nbrs,recv 32 from 3 nbrs
>>  2620 MatMultTranspose_MPIAIJ: on rank 14 mat has 64 rows, 556 nonzeros,
>> send 365 to 51 nbrs,recv 34 from 3 nbrs
>>  2621 MatMultTranspose_MPIAIJ: on rank 15 mat has 64 rows, 542 nonzeros,
>> send 322 to 31 nbrs,recv 36 from 3 nbrs
>>  2622 MatMultTranspose_MPIAIJ: on rank 16 mat has 64 rows, 531 nonzeros,
>> send 411 to 44 nbrs,recv 34 from 3 nbrs
>>  2623 MatMultTranspose_MPIAIJ: on rank 17 mat has 70 rows, 497 nonzeros,
>> send 476 to 36 nbrs,recv 28 from 4 nbrs
>>  2624 MatMultTranspose_MPIAIJ: on rank 18 mat has 61 rows, 426 nonzeros,
>> send 0 to 0 nbrs,recv 30 from 4 nbrs
>>  2625 MatMultTranspose_MPIAIJ: on rank 19 mat has 64 rows, 521 nonzeros,
>> send 0 to 0 nbrs,recv 31 from 4 nbrs
>>  ...
>>
>> The machine has 36 cores per node.  It acts as if the first 18 processors
>> on the first node are sending small messages to the remaining processors.
>> Obviously, there is no way for this to be balanced. Does someone have good
>> explanations for that and  know options to get rid of these imbalance?  for
>> example, no idle processors, spread-out communications etc.
>> Thanks.
>>
>
> There are processors that are deactivated on coarse grids. If you want to
> mimimize this process reduction then use "-pc_gamg_process_eq_limit 1".
> This will reduce the active number of processor only when there are more
> processors than  equations, in which case we have not choice because we
> partition matrices by (whole) rows. The default is 50, and this is pretty
> low, I usually run with about 200. But, this is very architecture and
> problem and metric specific so you can do a parameter sweep and measure
> where your problems/machine perform best.
>
> Mark
>
>
>>
>> --Junchao Zhang
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180627/25117599/attachment.html>