<div><br></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 23, 2020 at 9:35 PM Satish Balay <<a href="mailto:balay@mcs.anl.gov">balay@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">On Thu, 23 Jul 2020, Jeff Hammond wrote:<br>

<br>

> Open-MPI refuses to let users over subscribe without an extra flag to<br>

> mpirun.<br>

<br>

Yes - and when using this flag - it lets the run through - but there is still performance degradation in oversubscribe mode.<br>

<br>

> I think Intel MPI has an option for blocking poll that supports<br>

> oversubscription “nicely”.<br>

<br>

What option is this? Is it compile time option or something for mpiexec?<br>

</blockquote><div dir="auto"><br></div><div dir="auto"><div><a href="https://software.intel.com/content/www/us/en/develop/articles/tuning-the-intel-mpi-library-advanced-techniques.html">https://software.intel.com/content/www/us/en/develop/articles/tuning-the-intel-mpi-library-advanced-techniques.html</a></div><br></div><div dir="auto"><h2 style="box-sizing:border-box;font-family:intel-clear,tahoma,Helvetica,helvetica,Arial,sans-serif;font-weight:200;line-height:24px;margin:0px 0px 10px;font-size:20px;padding:10px 0px 0px;width:auto;color:rgb(83,86,90)">Apply wait mode to oversubscribed jobs<a class="inpage-nav-anchor" id="inpage-nav-undefined" style="box-sizing:border-box;font-family:intel-clear,tahoma,Helvetica,helvetica,Arial,sans-serif;color:rgb(0,113,197)"></a></h2><p style="font-size:16px;box-sizing:border-box;margin:0px auto 20px 0px;line-height:22px;font-family:intel-clear,tahoma,Helvetica,helvetica,Arial,sans-serif;padding-bottom:0px;max-width:none;color:rgb(83,86,90)">This option is particularly relevant for oversubscribed MPI jobs. The goal is to enable the wait mode of the progress engine in order to wait for messages without polling the fabric(s). This can save CPU cycles but decreases the message-response rate (latency), so it should be used with caution. To enable wait mode simply use:</p><pre style="box-sizing:border-box;direction:ltr;white-space:pre-wrap;word-break:initial;word-wrap:break-word;line-height:1.42857143;font-family:Monaco,Menlo,Consolas,"Courier New",monospace;padding:10.5px;margin-top:0px;margin-bottom:11px;overflow:auto;border:1px solid rgb(204,204,204);border-top-left-radius:4px;border-top-right-radius:4px;border-bottom-right-radius:4px;border-bottom-left-radius:4px;font-size:15px;max-height:400px;background-color:rgb(245,245,245);color:rgb(149,149,149)"><code style="box-sizing:border-box;font-family:Monaco,Menlo,Consolas,"Courier New",monospace;font-size:inherit;padding:0px;border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;background-color:transparent;color:inherit">I_MPI_WAIT_MODE=1</code></pre></div><div dir="auto"><br></div><div dir="auto">Jeff</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)"><br>

Satish<br>

<br>

> MPICH might have a “no local” option that<br>

> disables shared memory, in which case nemesis over libfabric with the<br>

> sockets or TCP provider _might_ do the right thing. But you should ask<br>

> MPICH people for details.<br>

> <br>

> Jeff<br>

> <br>

> On Thu, Jul 23, 2020 at 12:40 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br>

> <br>

> > I think we should default to ch3:nemesis when --download-mpich, and only<br>

> > do ch3:sock when requested (which we would do in CI).<br>

> ><br>

> > Satish Balay via petsc-dev <<a href="mailto:petsc-dev@mcs.anl.gov" target="_blank">petsc-dev@mcs.anl.gov</a>> writes:<br>

> ><br>

> > > Primarily because ch3:sock performance does not degrade in oversubscribe<br>

> > mode - which is developer friendly - i.e on your laptop.<br>

> > ><br>

> > > And folks doing optimized runs should use a properly tuned MPI for their<br>

> > setup anyway.<br>

> > ><br>

> > > In this case --download-mpich-device=ch3:nemesis is likely appropriate<br>

> > if using --download-mpich [and not using a separate/optimized MPI]<br>

> > ><br>

> > > Having defaults that satisfy all use cases is not practical.<br>

> > ><br>

> > > Satish<br>

> > ><br>

> > > On Wed, 22 Jul 2020, Matthew Knepley wrote:<br>

> > ><br>

> > >> We default to ch3:sock. Scott MacLachlan just had a long thread on the<br>

> > >> Firedrake list where it ended up that reconfiguring using ch3:nemesis<br>

> > had a<br>

> > >> 2x performance boost on his 16-core proc, and noticeable effect on the 4<br>

> > >> core speedup.<br>

> > >><br>

> > >> Why do we default to sock?<br>

> > >><br>

> > >>   Thanks,<br>

> > >><br>

> > >>      Matt<br>

> > >><br>

> > >><br>

> ><br>

> <br>

</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>