[petsc-dev] Understanding Some Parallel Results with PETSc

Thu Feb 23 23:41:33 CST 2012

I could also send you my mpi/numactl command lines for gpu and cpu when I am
back in the office.

Dave Nystrom writes:
 > Jed Brown writes:
 >  > On Thu, Feb 23, 2012 at 18:53, Nystrom, William D <wdn at lanl.gov> wrote:
 >  > 
 >  > > Rerunning the CPU case with numactl results in a 25x speedup and
 >  > > log_summary
 >  > > results that look reasonable to me now.
 >  > >
 >  > 
 >  > What command are you using for this?  We usually use the affinity options to
 >  > mpiexec instead of using numactl/taskset manually.
 > 
 > I was using openmpi-1.5.4 as installed by the system admins on our testbed
 > cluster.  I talked to a couple of our openmpi developers and they indicated
 > that the affinity stuff was broken in that version but should be fixed when
 > 1.5.5 and 1.6 come out - which should be within the next month.
 > 
 > I also tried mvapich2-1.7 built with slurm and tried using the affinity stuff
 > with srun.  That also did not seem to work.  But I should probably revisit
 > that and try to make sure that I really understand how to use srun.
 > 
 > I was pretty surprised that getting the numa stuff right made such a huge
 > difference.  I'm also wondering if getting the affinity right will make much
 > of a difference for the gpu case.
 > 
 >  > Did you also set a specific memory policy?
 > 
 > I'm not sure what you mean by the above question but I'm kind of new to all
 > this numa stuff.
 > 
 >  > Which Linux kernel is this?
 > 
 > The OS was the latest beta of TOSS2.  If I remember, I can check next time I
 > am in my office.  It is probably RHEL6.