<div>I assume you mean: neither pthreads nor MPI &quot;offers very little potential performance improvement on a system like this&quot;?</div><div><br></div><div>I work for a small company (in Huntsville) of mechanical engineers with a limited budget where such machines are their staple. 狢&#39;m looking at PETSc and SLEPc to help with some of their large scale problems (e.g., generalized eigenvalue problems for matrices from 10K x 10K to matrices over 100K x 100K).<br>

</div><div><br></div><div>I&#39;m also looking at ways to parallelize of own codes, hence pthreads vs. MPI?</div><div><br></div><div>---John</div><br><div class="gmail_quote">On Sun, Jul 10, 2011 at 2:23 PM, Jed Brown <span dir="ltr">&lt;<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im"><div class="gmail_quote">On Sun, Jul 10, 2011 at 13:15, John Chludzinski <span dir="ltr">&lt;<a href="mailto:jchludzinski@gmail.com" target="_blank">jchludzinski@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>&quot;If you don&#39;t have a real MPI installed&quot;?<br></div><div><br></div><div>I installed (on Cygwin) using: ./configure CC=gcc FC=gfortran --download-mpich=1 PETSC_ARCH=arch-cygwin-gnu</div><div><br></div><div>


I&#39;ve compiled some example MPI code using mpicc. And I&#39;ve run the generated executable with: mpiexec -n &lt;some int&gt; &lt;executable&gt;.</div><div><br></div><div>&quot;ps&quot; says it created n-number of processes.</div>


<div><br></div><div>But it is on a 2-proc/4-core Windows box running Cygwin (of course, not configured as a cluster).</div><div><br></div><div>Do I have &quot;real MPI&quot; installed?</div></blockquote></div><br></div><div>

Yes.</div>

<div><br></div><div>Note that threading offers very little potential performance improvement on a system like this. It becomes more important if you have many &quot;fat&quot; nodes, for example if each node has 4 sockets with 12 cores per socket and you want to &quot;strong scale&quot; such that the subdomains become very small (less than 10k unknowns if inexpensive preconditioners are working, also depending on the network and intra-node bandwidth).</div>


</blockquote></div><br>