[mpich-discuss] MPICH2-1.0.8 performance issues on Opteron Cluster

James S Perrin james.s.perrin at manchester.ac.uk
Wed Jan 7 09:25:27 CST 2009


Darius,

	I will try out the 1.1 version shortly. Attached are two images from 
jumpshot of the same section of code using nemesis and ssm. I've set the 
view to be the same length of time (2.1s) for comparison. It seems to me 
that the Isends and Irecvs from the master to the slaves (and visa 
versa) are what are causing the slow down when using nemesis. These 
messages are quite small ~1k. The purple events are Allreduce Allgathers 
between the slaves.

Regards
James

Darius Buntinas wrote:
> James, Dmitry,
> 
> Would you be able to try the latest alpha version of 1.1?
> 
> http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.1a2/src/mpich2-1.1a2.tar.gz
> 
> Nemesis is the default channel in 1.1, so you don't have to specify
> --with-device= when configuring.
> 
> Note that if you have more than one process and/or thread per core,
> nemesis won't perform well.  This is because nemesis does active polling
> (but we expect to have a non-polling option for the final release).  Do
> you know if this is the case with your apps?
> 
> Thanks,
> -d
> 
> On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
>> We have similar experiences with nemesis in a prior mpich2 release.
>> (scalapack-ish applications on multicore linux cluster).
>> The resultant times were remarkably slower. The nemesis channel was an
>> experimental feature back then, I attributed slower performance to a
>> possible misconfiguration.
>> Is it possible to submit a new ticket (for non-ANL folks)?
>>
>>
>>
>> On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
>>> Hi,
>>> 	I thought I'd just mention that I too have found that our software 
>>> performs poorly with nemesis compared to ssm on our multi-core machines. 
>>> I've tried it on both a 2xDual core AMD x64 and 2xQuad core Xeon x64 
>>> machines. It's roughly 30% slower. I've not been able to do any analysis 
>>> as yet as to where the nemesis version is loosing out?
>>>
>>> 	The software performs mainly point-to-point communication in a master 
>>> and slaves model. As the software is interactive the slaves call 
>>> MPI_Iprobe while waiting for commands. Having compiled against the ssm 
>>> version would have no effect, would it?
>>>
>>> Regards
>>> James
>>>
>>> Sarat Sreepathi wrote:
>>>> Hello,
>>>>
>>>> We got a new 10-node Opteron cluster in our research group. Each node 
>>>> has two quad core Opterons. I installed MPICH2-1.0.8 with Pathscale(3.2) 
>>>> compilers and three device configurations (nemesis,ssm,sock). I built 
>>>> and tested using the Linpack(HPL) benchmark with ACML 4.2 BLAS library 
>>>> for the three different device configurations.
>>>>
>>>> I observed some unexpected results as the 'nemesis' configuration gave 
>>>> the worst performance. For the same problem parameters, the 'sock' 
>>>> version was faster and the 'ssm' version hangs. For further analysis, I 
>>>> obtained screenshots from the Ganglia monitoring tool for the three 
>>>> different runs. As you can see from the attached screenshots, the 
>>>> 'nemesis' version is consuming more 'system cpu' according to Ganglia. 
>>>> The 'ssm' version fares slightly better but it hangs towards the end.
>>>>
>>>> I may be missing something trivial here but can anyone account for this 
>>>> discrepancy? Isn't 'nemesis' device or 'ssm' device recommended for this 
>>>> cluster configuration? Your help is greatly appreciated.

-- 
------------------------------------------------------------------------
   James S. Perrin
   Visualization

   Research Computing Services
   Devonshire House, University Precinct
   The University of Manchester
   Oxford Road, Manchester, M13 9PL

   t: +44 (0) 161 275 6945
   e: james.perrin at manchester.ac.uk
   w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
  "The test of intellect is the refusal to belabour the obvious"
  - Alfred Bester
------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: js-nemesis.jpg
Type: image/jpeg
Size: 136425 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090107/128249ad/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: js-ssm.jpg
Type: image/jpeg
Size: 125646 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090107/128249ad/attachment-0001.jpg>


More information about the mpich-discuss mailing list