<div dir="ltr"><div>I am manually creating a structured tetrahedral mesh within my code and using the DMPlexCreateFromDAG function to make a DMPlex out of it. If I go with your suggestion, do I simply call DMRefine(...) after the mesh is distributed? Because I notice that regular refinement is present in PETSc 3.5.2 SNES ex12.c but not in the PETSc developer's version (which I am using).<br><br></div>Thanks,<br>Justin<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 11, 2014 at 4:07 AM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On Wed, Dec 10, 2014 at 7:34 PM, Justin Chang <span dir="ltr"><<a href="mailto:jychang48@gmail.com" target="_blank">jychang48@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi all,<br><br>So I am trying to run a speed-up (i.e., strong scaling) study by solving a diffusion problem much like in SNES ex12.c, and plan on using up to 1k cores on LANL's Mustang HPC system. However, it seems that DMPlexDistribute is taking an extremely long time. I am using -petscpartitioner_type parmetis on command line but it seems to make over 50% of the code execution time. Is this normal or is there a "better" way to conduct such a study?<br></div></div></blockquote><div><br></div></span><div>0) What mesh are you using? The most scalable way of running now is to read and distribute a coarse mesh and use regular refinement in parallel.</div><div><br></div><div>1) This is pure overhead in the sense that its one-to-many communication, and its done once, so most people do not report the time.</div><div><br></div><div>2) I agree its too slow. There is a branch in next that completely reworks distribution. We have run it up to 8K cores on Hector and</div><div>    it is faster.</div><div><br></div><div>3) Early next year we plan to have parallel startup working, where each process reads a chunk of the mesh, and then its redistributes</div><div>    for load balance.</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div>Thanks,<br>Justin<span class="HOEnZb"><font color="#888888"><br></font></span></div><span class="HOEnZb"><font color="#888888">

</font></span></blockquote></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>

</font></span></div></div>

</blockquote></div><br></div>