[petsc-dev] Understanding Some Parallel Results with PETSc

Fri Feb 24 14:36:52 CST 2012

Hmm, I'll look more carefully, but I think your binding is incorrect.

numactl -l --cpunodebind=$OMPI_COMM_WORLD_LOCAL_RANK ex2 $KSP_ARGS

NUMA node numbering is different from MPI ranks.

On Fri, Feb 24, 2012 at 12:11, Nystrom, William D <wdn at lanl.gov> wrote:

>  Hi Jed,
>
> Attached is a gzipped tarball of the stuff I used to run these two test
> problems with
> numactl.  Actually, I hacked them a bit this morning because I was running
> them in
> our test framework for doing acceptance testing of new systems.  But the
> scripts in
> the tarball should give you all the info you need.  There is a top level
> script called
> runPetsc that just invokes mpirun from openmpi and calls the wrapper
> scripts for using
> numactl.  You could actually dispense with the top level script and just
> invoke the
> mpirun commands yourself.  I include it as an easy way to document what I
> did.
> The runPetscProb_1 script runs petsc on the gpus using numactl to control
> the affinities
> of the gpus to the cpu numa nodes.  The runPetscProb_2 script runs petsc
> on the cpus
> using numactl.  Note that both of those wrapper scripts are using openmpi
> variables.
> I'm not sure how one would do the same thing with another flavor of mpi.
> But I imagine
> it is possible.  Also, I'm not sure if there are other more elegant ways
> to run with numactl
> than using the wrapper script approach.  Perhaps there is but this is what
> we have been
> doing.
>
> I've also included a Perl script called numa-maps that is useful for
> actually checking the
> affinities that you get while running in order to make sure that numactl
> is doing what
> you think it is doing.  I'm not sure where this script comes from.  I find
> it on some systems
> and not on others.
>
> I've also include logs with the output of cpuinfo, nvidia-smi and uname -a
> to answer any
> questions you had about the system I was running on.
>
> Finally, I've included  runPetscProb_1.log and runPetscProb_2.log which
> contains the
> log_summary output for my latest runs on the gpu and cpu respectively.
> Using numactl
> reduced the runtime for the gpu case as well but not as much as for the
> cpu case.  So
> the final result was that running the same problem while using all of the
> gpu resources
> on a node was about 2.5x times faster than using all of the cpu resources
> on the same
> number of nodes.
>
> Let me know if you need more any more info.  I'm planning to use this
> stuff to help test
> a new gpu cluster that we have just started acceptance testing on.  It has
> the same
> basic hardware as the testbed cluster for these results but has 308
> nodes.  That
> should be interesting and fun.
>
>
> Thanks,
>
> Dave
>
>  --
> Dave Nystrom
> LANL HPC-5
> Phone: 505-667-7913
> Email: wdn at lanl.gov
> Smail: Mail Stop B272
>        Group HPC-5
>        Los Alamos National Laboratory
>        Los Alamos, NM 87545
>
>   ------------------------------
> *From:* petsc-dev-bounces at mcs.anl.gov [petsc-dev-bounces at mcs.anl.gov] on
> behalf of Jed Brown [jedbrown at mcs.anl.gov]
> *Sent:* Thursday, February 23, 2012 10:43 PM
>
> *To:* For users of the development version of PETSc
> *Cc:* Dave Nystrom
>
> *Subject:* Re: [petsc-dev] Understanding Some Parallel Results with PETSc
>
>   On Thu, Feb 23, 2012 at 23:41, Dave Nystrom <dnystrom1 at comcast.net>wrote:
>
>> I could also send you my mpi/numactl command lines for gpu and cpu when I
>> am
>> back in the office.
>>
>
> Yes, please.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120224/7fc3822f/attachment.html>