[mpich-discuss] MPICH2-1.0.8 performance issues on Opteron Cluster

James S Perrin james.s.perrin at manchester.ac.uk
Wed Jan 7 11:00:20 CST 2009

	I've just tried out 1.1a2 and get similar results as 1.0.8 for both 
nemesis and ssm.


PS Zoom view in image is 0.21s of course!

James S Perrin wrote:
> Darius,
>     I will try out the 1.1 version shortly. Attached are two images from 
> jumpshot of the same section of code using nemesis and ssm. I've set the 
> view to be the same length of time (2.1s) for comparison. It seems to me 
> that the Isends and Irecvs from the master to the slaves (and visa 
> versa) are what are causing the slow down when using nemesis. These 
> messages are quite small ~1k. The purple events are Allreduce Allgathers 
> between the slaves.
> Regards
> James
> Darius Buntinas wrote:
>> James, Dmitry,
>> Would you be able to try the latest alpha version of 1.1?
>> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.1a2/src/mpich2-1.1a2.tar.gz 
>> Nemesis is the default channel in 1.1, so you don't have to specify
>> --with-device= when configuring.
>> Note that if you have more than one process and/or thread per core,
>> nemesis won't perform well.  This is because nemesis does active polling
>> (but we expect to have a non-polling option for the final release).  Do
>> you know if this is the case with your apps?
>> Thanks,
>> -d
>> On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
>>> We have similar experiences with nemesis in a prior mpich2 release.
>>> (scalapack-ish applications on multicore linux cluster).
>>> The resultant times were remarkably slower. The nemesis channel was an
>>> experimental feature back then, I attributed slower performance to a
>>> possible misconfiguration.
>>> Is it possible to submit a new ticket (for non-ANL folks)?
>>> On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
>>>> Hi,
>>>>     I thought I'd just mention that I too have found that our 
>>>> software performs poorly with nemesis compared to ssm on our 
>>>> multi-core machines. I've tried it on both a 2xDual core AMD x64 and 
>>>> 2xQuad core Xeon x64 machines. It's roughly 30% slower. I've not 
>>>> been able to do any analysis as yet as to where the nemesis version 
>>>> is loosing out?
>>>>     The software performs mainly point-to-point communication in a 
>>>> master and slaves model. As the software is interactive the slaves 
>>>> call MPI_Iprobe while waiting for commands. Having compiled against 
>>>> the ssm version would have no effect, would it?
>>>> Regards
>>>> James
>>>> Sarat Sreepathi wrote:
>>>>> Hello,
>>>>> We got a new 10-node Opteron cluster in our research group. Each 
>>>>> node has two quad core Opterons. I installed MPICH2-1.0.8 with 
>>>>> Pathscale(3.2) compilers and three device configurations 
>>>>> (nemesis,ssm,sock). I built and tested using the Linpack(HPL) 
>>>>> benchmark with ACML 4.2 BLAS library for the three different device 
>>>>> configurations.
>>>>> I observed some unexpected results as the 'nemesis' configuration 
>>>>> gave the worst performance. For the same problem parameters, the 
>>>>> 'sock' version was faster and the 'ssm' version hangs. For further 
>>>>> analysis, I obtained screenshots from the Ganglia monitoring tool 
>>>>> for the three different runs. As you can see from the attached 
>>>>> screenshots, the 'nemesis' version is consuming more 'system cpu' 
>>>>> according to Ganglia. The 'ssm' version fares slightly better but 
>>>>> it hangs towards the end.
>>>>> I may be missing something trivial here but can anyone account for 
>>>>> this discrepancy? Isn't 'nemesis' device or 'ssm' device 
>>>>> recommended for this cluster configuration? Your help is greatly 
>>>>> appreciated.
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------

   James S. Perrin

   Research Computing Services
   Devonshire House, University Precinct
   The University of Manchester
   Oxford Road, Manchester, M13 9PL

   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester

More information about the mpich-discuss mailing list