<p>Aron,</p>

<p>1. It's all NUMA</p>

<p>2. You don't get to repartition the matrix because that is unnatural and not a local optimization.</p>

<p>3. Because of 2, the algorithms are different, so direct comparison is not meaningful, but I do not buy that you can get the same throughput on the kernel that is natural and makes sense as a local optimization.</p>

<p>Jed</p>

<p><blockquote type="cite">On Nov 12, 2010 7:02 PM, "Aron Ahmadia" <<a href="mailto:aron.ahmadia@kaust.edu.sa">aron.ahmadia@kaust.edu.sa</a>> wrote:<br><br><p><font color="#500050">> A partial counter-point is that MatSolve with OpenMP is unlikely to be near the throughput of MPI-...</font></p>

I am just going to throw down the gauntlet here and say that I can<br>

reproduce or beat in (my choice of) OpenMP or pthreads on a reasonably<br>

UMA multi-core processor anything that can be implemented in MPI.<br>

<font color="#888888"><br>

-Aron<br>

</font></blockquote></p>