On Tue, Jun 16, 2009 at 1:13 PM, Alex Peyser <span dir="ltr"><<a href="mailto:peyser.alex@gmail.com">peyser.alex@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im">On Tuesday 16 June 2009 01:53:35 pm Matthew Knepley wrote:<br>
> On Tue, Jun 16, 2009 at 12:38 PM, xiaoyin ji<br>
</div>> <<a href="mailto:sapphire.jxy@gmail.com">sapphire.jxy@gmail.com</a><mailto:<a href="mailto:sapphire.jxy@gmail.com">sapphire.jxy@gmail.com</a>>> wrote: Hi there,<br>
<div><div></div><div class="h5">><br>
> I'm using PETSc MATMPIAIJ and ksp solver. It seems that PETSc will run<br>
> obviously faster if I set the number of CPUs close to the number of<br>
> computer nodes in the job file. By default MPIAIJ matrix is stored in<br>
> different processors and ksp solver will communicate for each step,<br>
> however since on each node several CPUs share the same memory while<br>
> ksp may still try to communicate through network card, this may mess<br>
> up a bit. Is there any way to detect which CPUs are sharing the same<br>
> memory? Thanks a lot.<br>
><br>
> The interface for this is mpirun or the job submission mechanism.<br>
><br>
> Matt<br>
><br>
><br>
> Best,<br>
> Xiaoyin Ji<br>
> --<br>
> What most experimenters take for granted before they begin their<br>
> experiments is infinitely more interesting than any results to which their<br>
> experiments lead. -- Norbert Wiener<br>
<br>
</div></div>I had a question on what is the best approach for this. Most of the time is<br>
spent inside of BLAS, correct? So wouldn't you maximize your operations by<br>
running one MPI/PETSC job per board (per shared memory), and use a<br>
multi-threaded BLAS that matches your board? You should cut down<br>
communications by some factor proportional to the number of threads per<br>
board, and the BLAS itself should better optimize most of your operations<br>
across the board, rather than relying on higher order parallelisms.</blockquote><div><br>This is a common misconception. In fact, most time is spent in MatVec or<br>BLAS1, neither of which benefit from MT BLAS.<br><br> Matt<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Regards,<br>
<font color="#888888">Alex Peyser<br>
</font></blockquote></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>