<div dir="ltr">Mark, <div> Can I conclude that with default settings, GAMG is inherently imbalanced (because of idle processors, biased communincation) and nonscalable (more idle processors with more cores)?</div><div> Are "-<span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">pc_gamg_process_eq_limit 200 <span style="font-size:small;text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">-</span><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">pc_gamg_coarse_eq_limit 200*ncores" good options that make GAMG <span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">load-balanced in sense that there are</span> at least 200 equations on each processor at the coarsest level ?</span></span></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div>
<br><div class="gmail_quote">On Tue, Jun 26, 2018 at 7:37 PM, Mark Adams <span dir="ltr"><<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><span class=""><div dir="ltr">On Fri, Jun 22, 2018 at 3:26 PM Junchao Zhang <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I instrumented PCMGMCycle_<wbr>Private() to pull out some info about the matrices and VecScatters used in MatSOR_MPIAIJ, <wbr>MatMultTranspose_MPIAIJ etc at each multigrid level to see how imbalanced they are. In my test, I have a 6 x 6 x 6 = 216 processor grid. Each processor has 30 x 30 x 30 grid points. The code uses 7-point stencil. Except for some boundary points, it looks the problem is perfectly balanced. From the output, I can see processors communicate with more and more neighbors as they enter coarser grids. For example, non-boundary processors first have 6 face-neighbors, then 18 edge-neighbors, then 26 vertex-neighbors, and then even more neighbors. At some level, the grid is only on the first few processors and others are idle. </div></div></blockquote><div><br></div></span><div>That is all expected. The 'stencils' get larger on coarser grids and the code reduces the number of active processors on coarse grids when there is not enough parallelism available.</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>The communication pattern is also imbalanced. For example, at level 3, I have</div><div><br></div><div><font face="monospace, monospace"> 2172 Entering MG level 3<br></font></div><div><font face="monospace, monospace"> ...</font></div><div><div><div><font face="monospace, monospace"> 2605 doing MatRestrict</font></div><div><font face="monospace, monospace"> 2606 MatMultTranspose_MPIAIJ: on rank 0 mat has 59 rows, 284 nonzeros, send 188 to 33 nbrs,recv 10 from 1 nbrs</font></div><div><font face="monospace, monospace"> 2607 MatMultTranspose_MPIAIJ: on rank 1 mat has 61 rows, 459 nonzeros, send 237 to 38 nbrs,recv 25 from 2 nbrs</font></div><div><font face="monospace, monospace"> 2608 MatMultTranspose_MPIAIJ: on rank 2 mat has 62 rows, 519 nonzeros, send 245 to 28 nbrs,recv 28 from 2 nbrs</font></div><div><font face="monospace, monospace"> 2609 MatMultTranspose_MPIAIJ: on rank 3 mat has 62 rows, 521 nonzeros, send 316 to 47 nbrs,recv 15 from 1 nbrs</font></div><div><font face="monospace, monospace"> 2610 MatMultTranspose_MPIAIJ: on rank 4 mat has 62 rows, 525 nonzeros, send 411 to 62 nbrs,recv 28 from 2 nbrs</font></div><div><font face="monospace, monospace"> 2611 MatMultTranspose_MPIAIJ: on rank 5 mat has 70 rows, 526 nonzeros, send 424 to 49 nbrs,recv 26 from 2 nbrs</font></div><div><font face="monospace, monospace"> 2612 MatMultTranspose_MPIAIJ: on rank 6 mat has 63 rows, 503 nonzeros, send 259 to 41 nbrs,recv 28 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2613 MatMultTranspose_MPIAIJ: on rank 7 mat has 64 rows, 374 nonzeros, send 349 to 62 nbrs,recv 32 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2614 MatMultTranspose_MPIAIJ: on rank 8 mat has 67 rows, 461 nonzeros, send 354 to 51 nbrs,recv 29 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2615 MatMultTranspose_MPIAIJ: on rank 9 mat has 67 rows, 462 nonzeros, send 274 to 42 nbrs,recv 31 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2616 MatMultTranspose_MPIAIJ: on rank 10 mat has 67 rows, 458 nonzeros, send 359 to 62 nbrs,recv 30 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2617 MatMultTranspose_MPIAIJ: on rank 11 mat has 70 rows, 482 nonzeros, send 364 to 51 nbrs,recv 25 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2618 MatMultTranspose_MPIAIJ: on rank 12 mat has 61 rows, 469 nonzeros, send 274 to 42 nbrs,recv 29 from 3 nbrs</font></div><div><font face="monospace, monospace"> 2619 MatMultTranspose_MPIAIJ: on rank 13 mat has 64 rows, 454 nonzeros, send 359 to 62 nbrs,recv 32 from 3 nbrs</font></div><div><font face="monospace, monospace"> 2620 MatMultTranspose_MPIAIJ: on rank 14 mat has 64 rows, 556 nonzeros, send 365 to 51 nbrs,recv 34 from 3 nbrs</font></div><div><font face="monospace, monospace"> 2621 MatMultTranspose_MPIAIJ: on rank 15 mat has 64 rows, 542 nonzeros, send 322 to 31 nbrs,recv 36 from 3 nbrs</font></div><div><font face="monospace, monospace"> 2622 MatMultTranspose_MPIAIJ: on rank 16 mat has 64 rows, 531 nonzeros, send 411 to 44 nbrs,recv 34 from 3 nbrs</font></div><div><font face="monospace, monospace"> 2623 MatMultTranspose_MPIAIJ: on rank 17 mat has 70 rows, 497 nonzeros, send 476 to 36 nbrs,recv 28 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2624 MatMultTranspose_MPIAIJ: on rank 18 mat has 61 rows, 426 nonzeros, send 0 to 0 nbrs,recv 30 from 4 nbrs</font></div><div><font face="monospace, monospace"> 2625 MatMultTranspose_MPIAIJ: on rank 19 mat has 64 rows, 521 nonzeros, send 0 to 0 nbrs,recv 31 from 4 nbrs</font></div></div></div><div><font face="monospace, monospace"> ...</font></div><div><font face="monospace, monospace"><br></font></div><div>The machine has 36 cores per node. It acts as if the first 18 processors on the first node are sending small messages to the remaining processors. Obviously, there is no way for this to be balanced. Does someone have good explanations for that and know options to get rid of these imbalance? for example, no idle processors, spread-out communications etc. </div><div>Thanks.</div></div></blockquote><div><br></div></div></div><div>There are processors that are deactivated on coarse grids. If you want to mimimize this process reduction then use "-pc_gamg_process_eq_limit 1". This will reduce the active number of processor only when there are more processors than equations, in which case we have not choice because we partition matrices by (whole) rows. The default is 50, and this is pretty low, I usually run with about 200. But, this is very architecture and problem and metric specific so you can do a parameter sweep and measure where your problems/machine perform best.</div><div><br></div><div>Mark</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>--Junchao Zhang<br></div>
</div>
</blockquote></div></div>
</blockquote></div><br></div>