[mpich-discuss] MPICH2-1.0.8 performance issueson Opteron Cluster

Anthony Chan chan at mcs.anl.gov
Thu Jan 8 09:32:17 CST 2009


The jumpshot pictures that you sent earlier has some red "events"
that are lined by red line.  If you didn't put in those red events
in your code, they could be related to communicator creation/destruction ?
You can _right_ click on the one of red bubbles to find out what
they are (or check the legend window)

A.Chan
----- "James Perrin" <James.S.Perrin at manchester.ac.uk> wrote:

> Hi,
> 
>    No, I'm not using any dynamic processes.
> 
> Quoting "Rajeev Thakur" <thakur at mcs.anl.gov>:
> 
> > Are you using MPI-2 dynamic process functions (connect-accept or
> spawn)? It
> > is possible that for dynamically connected processes on the same
> machine,
> > Nemesis communication goes over TCP instead of shared memory (Darius
> can
> > confirm), whereas with ssm it does not.
> >
> > Rajeev
> >
> >
> >> -----Original Message-----
> >> From: mpich-discuss-bounces at mcs.anl.gov
> >> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of James S
> Perrin
> >> Sent: Wednesday, January 07, 2009 11:00 AM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] MPICH2-1.0.8 performance
> >> issueson Opteron Cluster
> >>
> >> Hi,
> >> 	I've just tried out 1.1a2 and get similar results as
> >> 1.0.8 for both
> >> nemesis and ssm.
> >>
> >> Regards
> >> James
> >>
> >> PS Zoom view in image is 0.21s of course!
> >>
> >> James S Perrin wrote:
> >> > Darius,
> >> >
> >> >     I will try out the 1.1 version shortly. Attached are
> >> two images from
> >> > jumpshot of the same section of code using nemesis and ssm.
> >> I've set the
> >> > view to be the same length of time (2.1s) for comparison.
> >> It seems to me
> >> > that the Isends and Irecvs from the master to the slaves (and
> visa
> >> > versa) are what are causing the slow down when using nemesis.
> These
> >> > messages are quite small ~1k. The purple events are
> >> Allreduce Allgathers
> >> > between the slaves.
> >> >
> >> > Regards
> >> > James
> >> >
> >> > Darius Buntinas wrote:
> >> >> James, Dmitry,
> >> >>
> >> >> Would you be able to try the latest alpha version of 1.1?
> >> >>
> >> >>
> >> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarb
> >> alls/1.1a2/src/mpich2-1.1a2.tar.gz
> >> >>
> >> >>
> >> >> Nemesis is the default channel in 1.1, so you don't have to
> specify
> >> >> --with-device= when configuring.
> >> >>
> >> >> Note that if you have more than one process and/or thread per
> core,
> >> >> nemesis won't perform well.  This is because nemesis does
> >> active polling
> >> >> (but we expect to have a non-polling option for the final
> >> release).  Do
> >> >> you know if this is the case with your apps?
> >> >>
> >> >> Thanks,
> >> >> -d
> >> >>
> >> >> On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
> >> >>> We have similar experiences with nemesis in a prior
> >> mpich2 release.
> >> >>> (scalapack-ish applications on multicore linux cluster).
> >> >>> The resultant times were remarkably slower. The nemesis
> >> channel was an
> >> >>> experimental feature back then, I attributed slower
> >> performance to a
> >> >>> possible misconfiguration.
> >> >>> Is it possible to submit a new ticket (for non-ANL folks)?
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
> >> >>>> Hi,
> >> >>>>     I thought I'd just mention that I too have found that our
> >> >>>> software performs poorly with nemesis compared to ssm on our
> >> >>>> multi-core machines. I've tried it on both a 2xDual core
> >> AMD x64 and
> >> >>>> 2xQuad core Xeon x64 machines. It's roughly 30% slower. I've
> not
> >> >>>> been able to do any analysis as yet as to where the
> >> nemesis version
> >> >>>> is loosing out?
> >> >>>>
> >> >>>>     The software performs mainly point-to-point
> >> communication in a
> >> >>>> master and slaves model. As the software is interactive
> >> the slaves
> >> >>>> call MPI_Iprobe while waiting for commands. Having
> >> compiled against
> >> >>>> the ssm version would have no effect, would it?
> >> >>>>
> >> >>>> Regards
> >> >>>> James
> >> >>>>
> >> >>>> Sarat Sreepathi wrote:
> >> >>>>> Hello,
> >> >>>>>
> >> >>>>> We got a new 10-node Opteron cluster in our research
> >> group. Each
> >> >>>>> node has two quad core Opterons. I installed MPICH2-1.0.8
> with
> >> >>>>> Pathscale(3.2) compilers and three device configurations
> >> >>>>> (nemesis,ssm,sock). I built and tested using the
> Linpack(HPL)
> >> >>>>> benchmark with ACML 4.2 BLAS library for the three
> >> different device
> >> >>>>> configurations.
> >> >>>>>
> >> >>>>> I observed some unexpected results as the 'nemesis'
> >> configuration
> >> >>>>> gave the worst performance. For the same problem
> >> parameters, the
> >> >>>>> 'sock' version was faster and the 'ssm' version hangs.
> >> For further
> >> >>>>> analysis, I obtained screenshots from the Ganglia
> >> monitoring tool
> >> >>>>> for the three different runs. As you can see from the
> attached
> >> >>>>> screenshots, the 'nemesis' version is consuming more
> >> 'system cpu'
> >> >>>>> according to Ganglia. The 'ssm' version fares slightly
> >> better but
> >> >>>>> it hangs towards the end.
> >> >>>>>
> >> >>>>> I may be missing something trivial here but can anyone
> >> account for
> >> >>>>> this discrepancy? Isn't 'nemesis' device or 'ssm' device
> >> >>>>> recommended for this cluster configuration? Your help
> >> is greatly
> >> >>>>> appreciated.
> >> >
> >> >
> >> >
> >> --------------------------------------------------------------
> >> ----------
> >> >
> >> >
> >> >
> >> --------------------------------------------------------------
> >> ----------
> >> >
> >>
> >> --
> >> --------------------------------------------------------------
> >> ----------
> >>    James S. Perrin
> >>    Visualization
> >>
> >>    Research Computing Services
> >>    Devonshire House, University Precinct
> >>    The University of Manchester
> >>    Oxford Road, Manchester, M13 9PL
> >>
> >>    t: +44 (0) 161 275 6945
> >>    e: james.perrin at manchester.ac.uk
> >>    w: www.manchester.ac.uk/researchcomputing
> >> --------------------------------------------------------------
> >> ----------
> >>   "The test of intellect is the refusal to belabour the obvious"
> >>   - Alfred Bester
> >> --------------------------------------------------------------
> >> ----------
> >>
> >
> >



More information about the mpich-discuss mailing list