<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Dec 10, 2014 at 7:34 PM, Justin Chang <span dir="ltr"><<a href="mailto:jychang48@gmail.com" target="_blank">jychang48@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi all,<br><br>So I am trying to run a speed-up (i.e., strong scaling) study by solving a diffusion problem much like in SNES ex12.c, and plan on using up to 1k cores on LANL's Mustang HPC system. However, it seems that DMPlexDistribute is taking an extremely long time. I am using -petscpartitioner_type parmetis on command line but it seems to make over 50% of the code execution time. Is this normal or is there a "better" way to conduct such a study?<br></div></div></blockquote><div><br></div><div>0) What mesh are you using? The most scalable way of running now is to read and distribute a coarse mesh and use regular refinement in parallel.</div><div><br></div><div>1) This is pure overhead in the sense that its one-to-many communication, and its done once, so most people do not report the time.</div><div><br></div><div>2) I agree its too slow. There is a branch in next that completely reworks distribution. We have run it up to 8K cores on Hector and</div><div> it is faster.</div><div><br></div><div>3) Early next year we plan to have parallel startup working, where each process reads a chunk of the mesh, and then its redistributes</div><div> for load balance.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div>Thanks,<br>Justin<br></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</div></div>