<div>I assume you mean: neither pthreads nor MPI "offers very little potential performance improvement on a system like this"?</div><div><br></div><div>I work for a small company (in Huntsville) of mechanical engineers with a limited budget where such machines are their staple. I'm looking at PETSc and SLEPc to help with some of their large scale problems (e.g., generalized eigenvalue problems for matrices from 10K x 10K to matrices over 100K x 100K).<br>
</div><div><br></div><div>I'm also looking at ways to parallelize of own codes, hence pthreads vs. MPI?</div><div><br></div><div>---John</div><br><div class="gmail_quote">On Sun, Jul 10, 2011 at 2:23 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im"><div class="gmail_quote">On Sun, Jul 10, 2011 at 13:15, John Chludzinski <span dir="ltr"><<a href="mailto:jchludzinski@gmail.com" target="_blank">jchludzinski@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>"If you don't have a real MPI installed"?<br></div><div><br></div><div>I installed (on Cygwin) using: ./configure CC=gcc FC=gfortran --download-mpich=1 PETSC_ARCH=arch-cygwin-gnu</div><div><br></div><div>
I've compiled some example MPI code using mpicc. And I've run the generated executable with: mpiexec -n <some int> <executable>.</div><div><br></div><div>"ps" says it created n-number of processes.</div>
<div><br></div><div>But it is on a 2-proc/4-core Windows box running Cygwin (of course, not configured as a cluster).</div><div><br></div><div>Do I have "real MPI" installed?</div></blockquote></div><br></div><div>
Yes.</div>
<div><br></div><div>Note that threading offers very little potential performance improvement on a system like this. It becomes more important if you have many "fat" nodes, for example if each node has 4 sockets with 12 cores per socket and you want to "strong scale" such that the subdomains become very small (less than 10k unknowns if inexpensive preconditioners are working, also depending on the network and intra-node bandwidth).</div>
</blockquote></div><br>