<p>Matt, it makes sense for small subdomains in 3D with wide stencils. It's not difficult to cut the memory by a third or so with 8 subdomains. Of course you also need a preconditioner that makes sense on multicore and a threading scheme with decent memory guarantees on NUMA.</p>
<p><blockquote type="cite">On Mar 15, 2011 1:54 AM, "Matthew Knepley" <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br><br><p><font color="#500050">On Mon, Mar 14, 2011 at 7:48 PM, Lisandro Dalcin <<a href="mailto:dalcinl@gmail.com" target="_blank">dalcinl@gmail.com</a>> wrote:</font></p>
<div class="gmail_quote"><p><font color="#500050">><br>> On 14 March 2011 20:48, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>> ><br>> > Eric,<br>
> ><br>
> > With th...</font></p><div>I have never ever seen convincing evidence of this. First, you would need enough bandwidth to satisfy</div><div>2+ cores. This is almost never the case. But suppose you do have this. Then you would need a convincing</div>
<div>reason to use threads instead of MPI processes, which would mean data reuse. But there is very little</div><div>reue here; it is mostly streaming.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
--<p><font color="#500050"><br>> Lisandro Dalcin<br>> ---------------<br>> CIMEC (INTEC/CONICET-UNL)<br>> Predio CONICET-Santa Fe<br>> Colecto...</font></p></blockquote></div><font color="#888888"><br><br clear="all">
<br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>
</font></blockquote></p>