Hmm, I'll look more carefully, but I think your binding is incorrect.<div><br></div><div>numactl -l --cpunodebind=$OMPI_COMM_WORLD_LOCAL_RANK ex2 $KSP_ARGS</div><div><br></div><div>NUMA node numbering is different from MPI ranks.<br>

<br><div class="gmail_quote">On Fri, Feb 24, 2012 at 12:11, Nystrom, William D <span dir="ltr"><<a href="mailto:wdn@lanl.gov">wdn@lanl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>

<div style="direction:ltr;font-family:Arial;color:#000000;font-size:14pt">Hi Jed,<br>

<br>

Attached is a gzipped tarball of the stuff I used to run these two test problems with<br>

numactl.  Actually, I hacked them a bit this morning because I was running them in<br>

our test framework for doing acceptance testing of new systems.  But the scripts in<br>

the tarball should give you all the info you need.  There is a top level script called<br>

runPetsc that just invokes mpirun from openmpi and calls the wrapper scripts for using<br>

numactl.  You could actually dispense with the top level script and just invoke the<br>

mpirun commands yourself.  I include it as an easy way to document what I did.<br>

The runPetscProb_1 script runs petsc on the gpus using numactl to control the affinities<br>

of the gpus to the cpu numa nodes.  The runPetscProb_2 script runs petsc on the cpus<br>

using numactl.  Note that both of those wrapper scripts are using openmpi variables.<br>

I'm not sure how one would do the same thing with another flavor of mpi.  But I imagine<br>

it is possible.  Also, I'm not sure if there are other more elegant ways to run with numactl<br>

than using the wrapper script approach.  Perhaps there is but this is what we have been<br>

doing.<br>

<br>

I've also included a Perl script called numa-maps that is useful for actually checking the<br>

affinities that you get while running in order to make sure that numactl is doing what<br>

you think it is doing.  I'm not sure where this script comes from.  I find it on some systems<br>

and not on others.<br>

<br>

I've also include logs with the output of cpuinfo, nvidia-smi and uname -a to answer any<br>

questions you had about the system I was running on.<br>

<br>

Finally, I've included  runPetscProb_1.log and runPetscProb_2.log which contains the<br>

log_summary output for my latest runs on the gpu and cpu respectively.  Using numactl<br>

reduced the runtime for the gpu case as well but not as much as for the cpu case.  So<br>

the final result was that running the same problem while using all of the gpu resources<br>

on a node was about 2.5x times faster than using all of the cpu resources on the same<br>

number of nodes.<br>

<br>

Let me know if you need more any more info.  I'm planning to use this stuff to help test<br>

a new gpu cluster that we have just started acceptance testing on.  It has the same<br>

basic hardware as the testbed cluster for these results but has 308 nodes.  That<br>

should be interesting and fun.<div class="im"><br>

<br>

Thanks,<br>

<br>

Dave<br>

<br>

<div>

<div style="font-family:Tahoma;font-size:13px"><font><span style="font-size:10pt">--

<br>

Dave Nystrom<br>

LANL HPC-5<br>

Phone: <a href="tel:505-667-7913" value="+15056677913" target="_blank">505-667-7913</a><br>

Email: <a href="mailto:wdn@lanl.gov" target="_blank">wdn@lanl.gov</a><br>

Smail: Mail Stop B272<br>

       Group HPC-5<br>

       Los Alamos National Laboratory<br>

       Los Alamos, NM 87545<br>

</span></font><br>

</div>

</div>

</div><div style="font-family:Times New Roman;color:rgb(0,0,0);font-size:16px">

<hr>

<div style="direction:ltr"><font color="#000000" face="Tahoma"><b>From:</b> <a href="mailto:petsc-dev-bounces@mcs.anl.gov" target="_blank">petsc-dev-bounces@mcs.anl.gov</a> [<a href="mailto:petsc-dev-bounces@mcs.anl.gov" target="_blank">petsc-dev-bounces@mcs.anl.gov</a>] on behalf of Jed Brown [<a href="mailto:jedbrown@mcs.anl.gov" target="_blank">jedbrown@mcs.anl.gov</a>]<br>


<b>Sent:</b> Thursday, February 23, 2012 10:43 PM<div class="im"><br>

<b>To:</b> For users of the development version of PETSc<br>

</div><b>Cc:</b> Dave Nystrom<div class="im"><br>

<b>Subject:</b> Re: [petsc-dev] Understanding Some Parallel Results with PETSc<br>

</div></font><br>

</div><div><div></div><div class="h5">

<div></div>

<div>

<div class="gmail_quote">On Thu, Feb 23, 2012 at 23:41, Dave Nystrom <span dir="ltr">

<<a href="mailto:dnystrom1@comcast.net" target="_blank">dnystrom1@comcast.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div>I could also send you my mpi/numactl command lines for gpu and cpu when I am<br>

back in the office.</div>

</blockquote>

</div>

<br>

<div>Yes, please.</div>

</div>

</div></div></div>

</div>

</div>


</blockquote></div><br></div>