<div dir="ltr">&gt; PETSc can use threads or the &quot;HMPI&quot; approach. We are in the process of unifying the code for all the threading models, at which point PETSc will be able to use your OpenMP threads.<div><br></div>


<div>Jed, does this mean (in future) PETSc can automatically recognize my OpenMP threads  at the core level and use them for parallelization  even if the binary is run as 1 MPI process/node?<br><br><div class="gmail_quote">


On Fri, Apr 20, 2012 at 12:42 PM, Jed Brown <span dir="ltr">&lt;<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="HOEnZb"><div class="h5"><p> PETSc can use threads or the &quot;HMPI&quot; approach. We are in the process of unifying the code for all the threading models, at which point PETSc will be able to use your OpenMP threads.</p>


<p>&gt; On Apr 20, 2012 1:50 PM, &quot;Mohammad Mirzadeh&quot; &lt;<a href="mailto:mirzadeh@gmail.com" target="_blank">mirzadeh@gmail.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt; Thanks Aron. That would work but at the cost of wasting idle cores when threads join and the rest of MPI-based code is running, correct?<br>

&gt;&gt;<br>

&gt;&gt; On Fri, Apr 20, 2012 at 11:44 AM, Aron Ahmadia &lt;<a href="mailto:aron.ahmadia@kaust.edu.sa" target="_blank">aron.ahmadia@kaust.edu.sa</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; If I use, say Np = 16 processes on one node, MPI is running 16 versions of the code on a single node (which has 16 cores). How does OpenMP figure out how to fork? Does it fork a total of 16 threads/MPI process = 256 threads or is it smart to just fork a total of 16 threads/node = 1 thread/core = 16 threads? I&#39;m a bit confused here how the job is scheduled when MPI and OpenMP are mixed? <br>


&gt;&gt;&gt;<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; This is one important use for OMP_NUM_THREADS.   If you&#39;re trying to increase the amount of memory per process, you should map one process per node and set OMP_NUM_THREADS to the number of OpenMP threads you&#39;d like.  There are lots of tutorials and even textbooks now that discuss hybrid programming techniques that you should look to for more information (or you could try <a href="http://scicomp.stackexchange.com" target="_blank">scicomp.stackexchange.com</a>).<br>


&gt;&gt;&gt;<br>

&gt;&gt;&gt; Cheers,<br>

&gt;&gt;&gt; Aron<br>

&gt;&gt;<br>

&gt;&gt;<br>

</p>

</div></div></blockquote></div><br></div></div>