<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Here are the (attached) output of -log_view for both cases. The
beginning of the files has some info from the libmesh app.<br>
</p>
<p>Running in 1 node, 32 cores: 01_node_log_view.txt</p>
<p>Running in 20 nodes, 32 cores each (640 cores in total):
01_node_log_view.txt</p>
<p>Thanks!</p>
<p>Luciano.<br>
</p>
<div class="moz-cite-prefix">Em 03/02/2021 16:43, Matthew Knepley
escreveu:<br>
</div>
<blockquote type="cite"
cite="mid:CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira
<<a href="mailto:luciano.siqueira@usp.br"
moz-do-not-send="true">luciano.siqueira@usp.br</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hello,<br>
<br>
I'm evaluating the performance of an application in a
distributed <br>
environment and I notice that it's much slower when running
in many <br>
nodes/cores when compared to a single node with a fewer
cores.<br>
<br>
When running the application in 20 nodes, the Main Stage
time reported <br>
in PETSc's log is up to 10 times slower than it is when
running the same <br>
application in only 1 node, even with fewer cores per node.<br>
<br>
The application I'm running is an example code provided by
libmesh:<br>
<br>
<a
href="http://libmesh.github.io/examples/introduction_ex4.html"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://libmesh.github.io/examples/introduction_ex4.html</a><br>
<br>
The application runs inside a Singularity container, with
openmpi-4.0.3 <br>
and PETSc 3.14.3. The distributed processes are managed by
slurm <br>
17.02.11 and each node is equipped with two Intel CPU Xeon
E5-2695v2 Ivy <br>
Bridge (12c @2,4GHz) and 128Gb of RAM, all communications
going through <br>
infiniband.<br>
<br>
My questions are: Is the slowdown expected? Should the
application be <br>
specially tailored to work well in distributed environments?<br>
<br>
Also, where (maybe in PETSc documentation/source-code) can I
find <br>
information on how PETSc handles MPI communications? Do the
KSP solvers <br>
favor one-to-one process communication over broadcast
messages or <br>
vice-versa? I suspect inter-process communication must be
the cause of <br>
the poor performance when using many nodes, but not as much
as I'm seeing.<br>
<br>
Thank you in advance!<br>
</blockquote>
<div><br>
</div>
<div>We can't say anything about the performance without some
data. Please send us the output</div>
<div>of -log_view for both cases.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Luciano.<br>
<br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>