[mpich-discuss] MPICH2-1.0.8 performance issueson Opteron Cluster

James S Perrin james.s.perrin at manchester.ac.uk
Fri Jan 9 05:19:08 CST 2009


Hi,
	In the key they are listed as MPE_Comm_init and MPE_Comm_finalize (if I 
zoom in an orange blob can be seen too).

	My application runs use "modules" connected together to form a data 
processing pipeline, each module being a separate algorithm or method. 
Each module when it is first instanced creates a Comm for it to use 
between the slaves. (This is to allow each module to run on a subset of 
slaves).

Regards
James

Anthony Chan wrote:
> The jumpshot pictures that you sent earlier has some red "events"
> that are lined by red line.  If you didn't put in those red events
> in your code, they could be related to communicator creation/destruction ?
> You can _right_ click on the one of red bubbles to find out what
> they are (or check the legend window)
> 
> A.Chan
> ----- "James Perrin" <James.S.Perrin at manchester.ac.uk> wrote:
> 
>> Hi,
>>
>>    No, I'm not using any dynamic processes.
>>
>> Quoting "Rajeev Thakur" <thakur at mcs.anl.gov>:
>>
>>> Are you using MPI-2 dynamic process functions (connect-accept or
>> spawn)? It
>>> is possible that for dynamically connected processes on the same
>> machine,
>>> Nemesis communication goes over TCP instead of shared memory (Darius
>> can
>>> confirm), whereas with ssm it does not.
>>>
>>> Rajeev
>>>
>>>
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of James S
>> Perrin
>>>> Sent: Wednesday, January 07, 2009 11:00 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: Re: [mpich-discuss] MPICH2-1.0.8 performance
>>>> issueson Opteron Cluster
>>>>
>>>> Hi,
>>>> 	I've just tried out 1.1a2 and get similar results as
>>>> 1.0.8 for both
>>>> nemesis and ssm.
>>>>
>>>> Regards
>>>> James
>>>>
>>>> PS Zoom view in image is 0.21s of course!
>>>>
>>>> James S Perrin wrote:
>>>>> Darius,
>>>>>
>>>>>     I will try out the 1.1 version shortly. Attached are
>>>> two images from
>>>>> jumpshot of the same section of code using nemesis and ssm.
>>>> I've set the
>>>>> view to be the same length of time (2.1s) for comparison.
>>>> It seems to me
>>>>> that the Isends and Irecvs from the master to the slaves (and
>> visa
>>>>> versa) are what are causing the slow down when using nemesis.
>> These
>>>>> messages are quite small ~1k. The purple events are
>>>> Allreduce Allgathers
>>>>> between the slaves.
>>>>>
>>>>> Regards
>>>>> James
>>>>>
>>>>> Darius Buntinas wrote:
>>>>>> James, Dmitry,
>>>>>>
>>>>>> Would you be able to try the latest alpha version of 1.1?
>>>>>>
>>>>>>
>>>> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarb
>>>> alls/1.1a2/src/mpich2-1.1a2.tar.gz
>>>>>>
>>>>>> Nemesis is the default channel in 1.1, so you don't have to
>> specify
>>>>>> --with-device= when configuring.
>>>>>>
>>>>>> Note that if you have more than one process and/or thread per
>> core,
>>>>>> nemesis won't perform well.  This is because nemesis does
>>>> active polling
>>>>>> (but we expect to have a non-polling option for the final
>>>> release).  Do
>>>>>> you know if this is the case with your apps?
>>>>>>
>>>>>> Thanks,
>>>>>> -d
>>>>>>
>>>>>> On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
>>>>>>> We have similar experiences with nemesis in a prior
>>>> mpich2 release.
>>>>>>> (scalapack-ish applications on multicore linux cluster).
>>>>>>> The resultant times were remarkably slower. The nemesis
>>>> channel was an
>>>>>>> experimental feature back then, I attributed slower
>>>> performance to a
>>>>>>> possible misconfiguration.
>>>>>>> Is it possible to submit a new ticket (for non-ANL folks)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
>>>>>>>> Hi,
>>>>>>>>     I thought I'd just mention that I too have found that our
>>>>>>>> software performs poorly with nemesis compared to ssm on our
>>>>>>>> multi-core machines. I've tried it on both a 2xDual core
>>>> AMD x64 and
>>>>>>>> 2xQuad core Xeon x64 machines. It's roughly 30% slower. I've
>> not
>>>>>>>> been able to do any analysis as yet as to where the
>>>> nemesis version
>>>>>>>> is loosing out?
>>>>>>>>
>>>>>>>>     The software performs mainly point-to-point
>>>> communication in a
>>>>>>>> master and slaves model. As the software is interactive
>>>> the slaves
>>>>>>>> call MPI_Iprobe while waiting for commands. Having
>>>> compiled against
>>>>>>>> the ssm version would have no effect, would it?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> James
>>>>>>>>
>>>>>>>> Sarat Sreepathi wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> We got a new 10-node Opteron cluster in our research
>>>> group. Each
>>>>>>>>> node has two quad core Opterons. I installed MPICH2-1.0.8
>> with
>>>>>>>>> Pathscale(3.2) compilers and three device configurations
>>>>>>>>> (nemesis,ssm,sock). I built and tested using the
>> Linpack(HPL)
>>>>>>>>> benchmark with ACML 4.2 BLAS library for the three
>>>> different device
>>>>>>>>> configurations.
>>>>>>>>>
>>>>>>>>> I observed some unexpected results as the 'nemesis'
>>>> configuration
>>>>>>>>> gave the worst performance. For the same problem
>>>> parameters, the
>>>>>>>>> 'sock' version was faster and the 'ssm' version hangs.
>>>> For further
>>>>>>>>> analysis, I obtained screenshots from the Ganglia
>>>> monitoring tool
>>>>>>>>> for the three different runs. As you can see from the
>> attached
>>>>>>>>> screenshots, the 'nemesis' version is consuming more
>>>> 'system cpu'
>>>>>>>>> according to Ganglia. The 'ssm' version fares slightly
>>>> better but
>>>>>>>>> it hangs towards the end.
>>>>>>>>>
>>>>>>>>> I may be missing something trivial here but can anyone
>>>> account for
>>>>>>>>> this discrepancy? Isn't 'nemesis' device or 'ssm' device
>>>>>>>>> recommended for this cluster configuration? Your help
>>>> is greatly
>>>>>>>>> appreciated.
>>>>>
>>>>>
>>>> --------------------------------------------------------------
>>>> ----------
>>>>>
>>>>>
>>>> --------------------------------------------------------------
>>>> ----------
>>>> --
>>>> --------------------------------------------------------------
>>>> ----------
>>>>    James S. Perrin
>>>>    Visualization
>>>>
>>>>    Research Computing Services
>>>>    Devonshire House, University Precinct
>>>>    The University of Manchester
>>>>    Oxford Road, Manchester, M13 9PL
>>>>
>>>>    t: +44 (0) 161 275 6945
>>>>    e: james.perrin at manchester.ac.uk
>>>>    w: www.manchester.ac.uk/researchcomputing
>>>> --------------------------------------------------------------
>>>> ----------
>>>>   "The test of intellect is the refusal to belabour the obvious"
>>>>   - Alfred Bester
>>>> --------------------------------------------------------------
>>>> ----------
>>>>
>>>

-- 
------------------------------------------------------------------------
   James S. Perrin
   Visualization

   Research Computing Services
   Devonshire House, University Precinct
   The University of Manchester
   Oxford Road, Manchester, M13 9PL

   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester
------------------------------------------------------------------------



More information about the mpich-discuss mailing list