<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""></div><div> <a href="https://www.mcs.anl.gov/petsc/documentation/faq.html#computers" class="">https://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a></div><div><br class=""></div><div> In particular looking at the results of the parallel run I see </div><div><br class=""></div><div><div>Average time to get PetscTime(): 3.933e-07</div><div>Average time for MPI_Barrier(): 0.00498015</div><div>Average time for zero size MPI_Send(): 0.000194207</div><div><br class=""></div><div> So the times for communication are huge. 4.9 milliseconds for a synchronization of twenty processes. A millisecond is an eternity for parallel computing. It is not clear to me that this system is appropriate for tightly couple parallel simulations.</div><div><br class=""></div><div> Barry</div><div><br class=""></div><div><br class=""></div><div><br class=""></div></div><div><br class=""><blockquote type="cite" class=""><div class="">On Feb 3, 2021, at 2:40 PM, Luciano Siqueira <<a href="mailto:luciano.siqueira@usp.br" class="">luciano.siqueira@usp.br</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" class="">
<div class=""><p class="">Here are the (attached) output of -log_view for both cases. The
beginning of the files has some info from the libmesh app.<br class="">
</p><p class="">Running in 1 node, 32 cores: 01_node_log_view.txt</p><p class="">Running in 20 nodes, 32 cores each (640 cores in total):
01_node_log_view.txt</p><p class="">Thanks!</p><p class="">Luciano.<br class="">
</p>
<div class="moz-cite-prefix">Em 03/02/2021 16:43, Matthew Knepley
escreveu:<br class="">
</div>
<blockquote type="cite" cite="mid:CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com" class="">
<meta http-equiv="content-type" content="text/html; charset=UTF-8" class="">
<div dir="ltr" class="">
<div dir="ltr" class="">On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira
<<a href="mailto:luciano.siqueira@usp.br" moz-do-not-send="true" class="">luciano.siqueira@usp.br</a>>
wrote:<br class="">
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hello,<br class="">
<br class="">
I'm evaluating the performance of an application in a
distributed <br class="">
environment and I notice that it's much slower when running
in many <br class="">
nodes/cores when compared to a single node with a fewer
cores.<br class="">
<br class="">
When running the application in 20 nodes, the Main Stage
time reported <br class="">
in PETSc's log is up to 10 times slower than it is when
running the same <br class="">
application in only 1 node, even with fewer cores per node.<br class="">
<br class="">
The application I'm running is an example code provided by
libmesh:<br class="">
<br class="">
<a href="http://libmesh.github.io/examples/introduction_ex4.html" rel="noreferrer" target="_blank" moz-do-not-send="true" class="">http://libmesh.github.io/examples/introduction_ex4.html</a><br class="">
<br class="">
The application runs inside a Singularity container, with
openmpi-4.0.3 <br class="">
and PETSc 3.14.3. The distributed processes are managed by
slurm <br class="">
17.02.11 and each node is equipped with two Intel CPU Xeon
E5-2695v2 Ivy <br class="">
Bridge (12c @2,4GHz) and 128Gb of RAM, all communications
going through <br class="">
infiniband.<br class="">
<br class="">
My questions are: Is the slowdown expected? Should the
application be <br class="">
specially tailored to work well in distributed environments?<br class="">
<br class="">
Also, where (maybe in PETSc documentation/source-code) can I
find <br class="">
information on how PETSc handles MPI communications? Do the
KSP solvers <br class="">
favor one-to-one process communication over broadcast
messages or <br class="">
vice-versa? I suspect inter-process communication must be
the cause of <br class="">
the poor performance when using many nodes, but not as much
as I'm seeing.<br class="">
<br class="">
Thank you in advance!<br class="">
</blockquote>
<div class=""><br class="">
</div>
<div class="">We can't say anything about the performance without some
data. Please send us the output</div>
<div class="">of -log_view for both cases.</div>
<div class=""><br class="">
</div>
<div class=""> Thanks,</div>
<div class=""><br class="">
</div>
<div class=""> Matt</div>
<div class=""> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Luciano.<br class="">
<br class="">
</blockquote>
</div>
<br clear="all" class="">
<div class=""><br class="">
</div>
-- <br class="">
<div dir="ltr" class="gmail_signature">
<div dir="ltr" class="">
<div class="">
<div dir="ltr" class="">
<div class="">
<div dir="ltr" class="">
<div class="">What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br class="">
-- Norbert Wiener</div>
<div class=""><br class="">
</div>
<div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" moz-do-not-send="true" class="">https://www.cse.buffalo.edu/~knepley/</a><br class="">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<span id="cid:9223A33E-E85B-43EF-B692-B75D91D3AC96@hsd1.il.comcast.net"><01_node_log_view.txt></span><span id="cid:3991303D-5330-4ABA-987D-43861ACF9D71@hsd1.il.comcast.net"><20_node_log_view.txt></span></div></blockquote></div><br class=""></body></html>