<div dir="ltr">I agree with Matt, flat 64 will be faster, I would expect, but this code has global metadata that would have to be replicated in a full scale run. We are just doing single socket test now (I think).<div><br></div><div>We have been tracking down what look like compiler bugs and we have only taken at peak performance to make sure we are not wasting our time with threads.</div><div><br></div><div>I agree <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">16x4 VS 64 would be interesting to see.</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:12.8px">Mark</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 13, 2018 at 2:02 PM, Kong, Fande <span dir="ltr"><<a href="mailto:fande.kong@inl.gov" target="_blank">fande.kong@inl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Curious about the comparison of 16x4 VS 64.<span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888">Fande,<br></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 13, 2018 at 11:44 AM, Bakytzhan Kallemov <span dir="ltr"><<a href="mailto:bkallemov@lbl.gov" target="_blank">bkallemov@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Hi,<br>
</p>
<p>I am not sure about 64 flat run, <br>
</p>
<p>unfortunately I did not save logs since it's easy to run, but
for 16 - here is the plot I got for different number of threads
for KSPSolve time<br>
</p>
<p>Baky<br>
</p><div><div class="m_3408613283238315751h5">
<br>
<div class="m_3408613283238315751m_2936039630870271338moz-cite-prefix">On 02/13/2018 10:28 AM, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Feb 13, 2018 at 11:30 AM,
Smith, Barry F. <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span>
wrote:
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>
> On Feb 13, 2018, at 10:12 AM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>>
wrote:<br>
><br>
> FYI, we were able to get hypre with threads working
on KNL on Cori by going down to -O1 optimization. We are
getting about 2x speedup with 4 threads and 16 MPI
processes per socket. Not bad.<br>
<br>
</span> In other works using 16 MPI processes with 4
threads per process is twice as fast as running with 64
mpi processes? Could you send the -log_view output for
these two cases?</blockquote>
<div><br>
</div>
<div>Is that what you mean? I took it to mean</div>
<div><br>
</div>
<div> We ran 16MPI processes and got time T.</div>
<div> We ran 16MPI processes with 4 threads each and got
time T/2.</div>
<div><br>
</div>
<div>I would likely eat my shirt if 16x4 was 2x faster than
64.</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>
><br>
> There error, flatlined or slightly diverging hypre
solves, occurred even in flat MPI runs with openmp=1.<br>
<br>
</span> But the answers are wrong as soon as you turn on
OpenMP?<br>
<br>
Thanks<br>
<span class="m_3408613283238315751m_2936039630870271338HOEnZb"><font color="#888888"><br>
Barry<br>
</font></span>
<div class="m_3408613283238315751m_2936039630870271338HOEnZb">
<div class="m_3408613283238315751m_2936039630870271338h5"><br>
<br>
><br>
> We are going to test the Haswell nodes next.<br>
><br>
> On Thu, Jan 25, 2018 at 4:16 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>>
wrote:<br>
> Baky (cc'ed) is getting a strange error on
Cori/KNL at NERSC. Using maint it runs fine with
-with-openmp=0, it runs fine with -with-openmp=1 and
gamg, but with hypre and -with-openmp=1, even running
with flat MPI, the solver seems flatline (see attached
and notice that the residual starts to creep after a
few time steps).<br>
><br>
> Maybe you can suggest a hypre test that I can
run?<br>
><br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="m_3408613283238315751m_2936039630870271338gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__www.caam.rice.edu_-257Emk51_&d=DwMDaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=ZS-L-0QdKGNAOTdfVWHmsv4U3pZmyEvneNWi1bnUBUc&s=Mo_G_7aeedH4UA8ZyO0DCM5xoWvnNhyUcFeSSgPnrBE&e=" target="_blank">https://www.cse.buffalo.edu/~k<wbr>nepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>