[mpich-discuss] MPICH2-1.0.8 performance issues on Opteron Cluster
Dmitry V Golovashkin
Dmitry.Golovashkin at sas.com
Mon Jan 5 13:27:23 CST 2009
It might be a good idea to accompany each mpich2 release with extensive
performance benchmarks on popular mpi-based numerical libraries.
I can think of these:
http://www.netlib.org/scalapack
http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview
http://www.mcs.anl.gov/petsc/petsc-as/
For instance generate a couple of large random matrices and run O(n^3)
scalapack methods (pdgemm, etc.) to demonstrate that the newest mpich2
release is indeed an improvement.
> Do you know if this is the case with your apps?
Always one proc per core, no threads (export OMP_NUM_THREADS=1), no cpu contention.
> Would you be able to try the latest alpha version of 1.1?
would be glad to help when I am back, on vacation until jan-16 :-)
Thank you!
On Mon, 2009-01-05 at 12:15 -0500, Darius Buntinas wrote:
> James, Dmitry,
>
> Would you be able to try the latest alpha version of 1.1?
>
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.1a2/src/mpich2-1.1a2.tar.gz
>
> Nemesis is the default channel in 1.1, so you don't have to specify
> --with-device= when configuring.
>
> Note that if you have more than one process and/or thread per core,
> nemesis won't perform well. This is because nemesis does active polling
> (but we expect to have a non-polling option for the final release). Do
> you know if this is the case with your apps?
>
> Thanks,
> -d
>
> On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
> > We have similar experiences with nemesis in a prior mpich2 release.
> > (scalapack-ish applications on multicore linux cluster).
> > The resultant times were remarkably slower. The nemesis channel was an
> > experimental feature back then, I attributed slower performance to a
> > possible misconfiguration.
> > Is it possible to submit a new ticket (for non-ANL folks)?
> >
> >
> >
> > On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
> >> Hi,
> >> I thought I'd just mention that I too have found that our software
> >> performs poorly with nemesis compared to ssm on our multi-core machines.
> >> I've tried it on both a 2xDual core AMD x64 and 2xQuad core Xeon x64
> >> machines. It's roughly 30% slower. I've not been able to do any analysis
> >> as yet as to where the nemesis version is loosing out?
> >>
> >> The software performs mainly point-to-point communication in a master
> >> and slaves model. As the software is interactive the slaves call
> >> MPI_Iprobe while waiting for commands. Having compiled against the ssm
> >> version would have no effect, would it?
> >>
> >> Regards
> >> James
> >>
> >> Sarat Sreepathi wrote:
> >>> Hello,
> >>>
> >>> We got a new 10-node Opteron cluster in our research group. Each node
> >>> has two quad core Opterons. I installed MPICH2-1.0.8 with Pathscale(3.2)
> >>> compilers and three device configurations (nemesis,ssm,sock). I built
> >>> and tested using the Linpack(HPL) benchmark with ACML 4.2 BLAS library
> >>> for the three different device configurations.
> >>>
> >>> I observed some unexpected results as the 'nemesis' configuration gave
> >>> the worst performance. For the same problem parameters, the 'sock'
> >>> version was faster and the 'ssm' version hangs. For further analysis, I
> >>> obtained screenshots from the Ganglia monitoring tool for the three
> >>> different runs. As you can see from the attached screenshots, the
> >>> 'nemesis' version is consuming more 'system cpu' according to Ganglia.
> >>> The 'ssm' version fares slightly better but it hangs towards the end.
> >>>
> >>> I may be missing something trivial here but can anyone account for this
> >>> discrepancy? Isn't 'nemesis' device or 'ssm' device recommended for this
> >>> cluster configuration? Your help is greatly appreciated.
> >
>
More information about the mpich-discuss
mailing list