<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 10/10/2017 12:47 PM, Mark Adams
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CADOhEh72OR-T_8rqMzK8HSMO7rTWQFph=roY8dC10cRkRTVF9w@mail.gmail.com">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span class=""></span><br>
<span class=""></span>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">
</span> What are you comparing? Are you using say 32 MPI
processes and 2 threads or 16 MPI processes and 4 threads?
How are you controlling the number of OpenMP threads,
OpenMP environmental variable? What parts of the time in
the code are you comparing? You should just -log_view and
compare the times for PCApply and PCSetUp() between say 64
MPI process/1 thread and 32 MPI processes/2 threads and
send us the output for those two cases.<br>
</blockquote>
<div><br>
</div>
<div>These folks don't use many MPI processes. I'm not sure
what the optimal configuration is with Chombo-Crunch when
using all of Cori.</div>
<div><br>
</div>
<div>Baky: how many MPI processes per socket are you aiming
for on Cori-KNL?</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
right now I am testing it on a single KNL node going from flat 64+1
to 2+32 for comparison.<br>
But as you can see from the plot in the previous mail, we have a
sweet spot at 16+4 point, then we scale that accordingly when
running<br>
with 8k nodes.<br>
<br>
<br>
<br>
<br>
<blockquote type="cite"
cite="mid:CADOhEh72OR-T_8rqMzK8HSMO7rTWQFph=roY8dC10cRkRTVF9w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class=""><br>
><br>
> It seems that it made no difference, so perhaps I
am doing something wrong or my build is not configured
right.<br>
><br>
> Do you have any example that makes use of threads
when running hybrid and show an advantage?<br>
<br>
</span> There is not reason to think that using threads
on KNL is faster than just using MPI processes. Despite
what the NERSc/LBL web pages may say, just because a
website says something doesn't make it true.<br>
<div class="HOEnZb">
<div class="h5"><br>
<br>
><br>
> I'd like to test it and make sure that my libs
are configured correctly, before start to investigate
it further.<br>
><br>
><br>
> Thanks,<br>
><br>
> Baky<br>
><br>
><br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>