<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 22, 2016 at 8:06 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>

   I suggest focusing on asm. </blockquote><div><br></div><div>OK, I will switch gasm to asm, this does not work anyway.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Having blocks that span multiple processes seems like over kill for a smoother ? </blockquote><div><br></div><div>No, because it is a pain to have the math convolved with the parallel decompositions strategy (ie, I can't tell an application how to partition their problem). If an aggregate spans processor boundaries, which is fine and needed, and let's say we have a pretty uniform problem, then if the block gets split up, H is small in part of the domain and convergence could suffer along processor boundaries.  And having the math change as the parallel decomposition changes is annoying. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">(Major league overkill) in fact doesn't one want multiple blocks per process, ie. pretty small blocks.<br></blockquote><div><br></div><div>No, it is just doing what would be done in serial.  If the cost of moving the data across the processor is a problem then that is a tradeoff to consider.</div><div><br></div><div>And I think you are misunderstanding me.  There are lots of blocks per process (the aggregates are say 3^D in size).  And many of the aggregates/blocks along the processor boundary will be split between processors, resulting is mall blocks and weak ASM PC on processor boundaries.</div><div><br></div><div>I can understand ASM not being general and not letting blocks span processor boundaries, but I don't think the extra matrix communication costs are a big deal (done just once) and the vector communication costs are not bad, it probably does not include (too many) new processors to communicate with.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<span class=""><font color="#888888"><br>

   Barry<br>

</font></span><div class=""><div class="h5"><br>

> On Jun 22, 2016, at 7:51 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>

><br>

> I'm trying to get block smoothers to work for gamg.  We (Garth) tried this and got this error:<br>

><br>

><br>

>  - Another option is use '-pc_gamg_use_agg_gasm true' and use '-mg_levels_pc_type gasm'.<br>

><br>

><br>

> Running in parallel, I get<br>

><br>

>      ** Max-trans not allowed because matrix is distributed<br>

>  ----<br>

><br>

> First, what is the difference between asm and gasm?<br>

><br>

> Second, I need to fix this to get block smoothers. This used to work.  Did we lose the capability to have blocks that span processor subdomains?<br>

><br>

> gamg only aggregates across processor subdomains within one layer, so maybe I could use one layer of overlap in some way?<br>

><br>

> Thanks,<br>

> Mark<br>

><br>

<br>

</div></div></blockquote></div><br></div></div>