[petsc-dev] Understanding Some Parallel Results with PETSc
Dave Nystrom
dnystrom1 at comcast.net
Thu Feb 23 23:37:02 CST 2012
Jed Brown writes:
> On Thu, Feb 23, 2012 at 18:53, Nystrom, William D <wdn at lanl.gov> wrote:
>
> > Rerunning the CPU case with numactl results in a 25x speedup and
> > log_summary
> > results that look reasonable to me now.
> >
>
> What command are you using for this? We usually use the affinity options to
> mpiexec instead of using numactl/taskset manually.
I was using openmpi-1.5.4 as installed by the system admins on our testbed
cluster. I talked to a couple of our openmpi developers and they indicated
that the affinity stuff was broken in that version but should be fixed when
1.5.5 and 1.6 come out - which should be within the next month.
I also tried mvapich2-1.7 built with slurm and tried using the affinity stuff
with srun. That also did not seem to work. But I should probably revisit
that and try to make sure that I really understand how to use srun.
I was pretty surprised that getting the numa stuff right made such a huge
difference. I'm also wondering if getting the affinity right will make much
of a difference for the gpu case.
> Did you also set a specific memory policy?
I'm not sure what you mean by the above question but I'm kind of new to all
this numa stuff.
> Which Linux kernel is this?
The OS was the latest beta of TOSS2. If I remember, I can check next time I
am in my office. It is probably RHEL6.
More information about the petsc-dev
mailing list