[mpich-discuss] MPICH2-1.0.8 performance issues on Opteron Cluster

Wed Jan 7 11:00:20 CST 2009

Hi,
	I've just tried out 1.1a2 and get similar results as 1.0.8 for both 
nemesis and ssm.

Regards
James

PS Zoom view in image is 0.21s of course!

James S Perrin wrote:
> Darius,
> 
>     I will try out the 1.1 version shortly. Attached are two images from 
> jumpshot of the same section of code using nemesis and ssm. I've set the 
> view to be the same length of time (2.1s) for comparison. It seems to me 
> that the Isends and Irecvs from the master to the slaves (and visa 
> versa) are what are causing the slow down when using nemesis. These 
> messages are quite small ~1k. The purple events are Allreduce Allgathers 
> between the slaves.
> 
> Regards
> James
> 
> Darius Buntinas wrote:
>> James, Dmitry,
>>
>> Would you be able to try the latest alpha version of 1.1?
>>
>> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.1a2/src/mpich2-1.1a2.tar.gz 
>>
>>
>> Nemesis is the default channel in 1.1, so you don't have to specify
>> --with-device= when configuring.
>>
>> Note that if you have more than one process and/or thread per core,
>> nemesis won't perform well.  This is because nemesis does active polling
>> (but we expect to have a non-polling option for the final release).  Do
>> you know if this is the case with your apps?
>>
>> Thanks,
>> -d
>>
>> On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
>>> We have similar experiences with nemesis in a prior mpich2 release.
>>> (scalapack-ish applications on multicore linux cluster).
>>> The resultant times were remarkably slower. The nemesis channel was an
>>> experimental feature back then, I attributed slower performance to a
>>> possible misconfiguration.
>>> Is it possible to submit a new ticket (for non-ANL folks)?
>>>
>>>
>>>
>>> On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
>>>> Hi,
>>>>     I thought I'd just mention that I too have found that our 
>>>> software performs poorly with nemesis compared to ssm on our 
>>>> multi-core machines. I've tried it on both a 2xDual core AMD x64 and 
>>>> 2xQuad core Xeon x64 machines. It's roughly 30% slower. I've not 
>>>> been able to do any analysis as yet as to where the nemesis version 
>>>> is loosing out?
>>>>
>>>>     The software performs mainly point-to-point communication in a 
>>>> master and slaves model. As the software is interactive the slaves 
>>>> call MPI_Iprobe while waiting for commands. Having compiled against 
>>>> the ssm version would have no effect, would it?
>>>>
>>>> Regards
>>>> James
>>>>
>>>> Sarat Sreepathi wrote:
>>>>> Hello,
>>>>>
>>>>> We got a new 10-node Opteron cluster in our research group. Each 
>>>>> node has two quad core Opterons. I installed MPICH2-1.0.8 with 
>>>>> Pathscale(3.2) compilers and three device configurations 
>>>>> (nemesis,ssm,sock). I built and tested using the Linpack(HPL) 
>>>>> benchmark with ACML 4.2 BLAS library for the three different device 
>>>>> configurations.
>>>>>
>>>>> I observed some unexpected results as the 'nemesis' configuration 
>>>>> gave the worst performance. For the same problem parameters, the 
>>>>> 'sock' version was faster and the 'ssm' version hangs. For further 
>>>>> analysis, I obtained screenshots from the Ganglia monitoring tool 
>>>>> for the three different runs. As you can see from the attached 
>>>>> screenshots, the 'nemesis' version is consuming more 'system cpu' 
>>>>> according to Ganglia. The 'ssm' version fares slightly better but 
>>>>> it hangs towards the end.
>>>>>
>>>>> I may be missing something trivial here but can anyone account for 
>>>>> this discrepancy? Isn't 'nemesis' device or 'ssm' device 
>>>>> recommended for this cluster configuration? Your help is greatly 
>>>>> appreciated.
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 

-- 
------------------------------------------------------------------------
   James S. Perrin
   Visualization

   Research Computing Services
   Devonshire House, University Precinct
   The University of Manchester
   Oxford Road, Manchester, M13 9PL

   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester
------------------------------------------------------------------------