[mpich-discuss] MPICH2-1.0.8 performance issues on Opteron Cluster
Darius Buntinas
buntinas at mcs.anl.gov
Mon Jan 5 11:15:55 CST 2009
James, Dmitry,
Would you be able to try the latest alpha version of 1.1?
http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.1a2/src/mpich2-1.1a2.tar.gz
Nemesis is the default channel in 1.1, so you don't have to specify
--with-device= when configuring.
Note that if you have more than one process and/or thread per core,
nemesis won't perform well. This is because nemesis does active polling
(but we expect to have a non-polling option for the final release). Do
you know if this is the case with your apps?
Thanks,
-d
On 01/05/2009 09:15 AM, Dmitry V Golovashkin wrote:
> We have similar experiences with nemesis in a prior mpich2 release.
> (scalapack-ish applications on multicore linux cluster).
> The resultant times were remarkably slower. The nemesis channel was an
> experimental feature back then, I attributed slower performance to a
> possible misconfiguration.
> Is it possible to submit a new ticket (for non-ANL folks)?
>
>
>
> On Mon, 2009-01-05 at 09:00 -0500, James S Perrin wrote:
>> Hi,
>> I thought I'd just mention that I too have found that our software
>> performs poorly with nemesis compared to ssm on our multi-core machines.
>> I've tried it on both a 2xDual core AMD x64 and 2xQuad core Xeon x64
>> machines. It's roughly 30% slower. I've not been able to do any analysis
>> as yet as to where the nemesis version is loosing out?
>>
>> The software performs mainly point-to-point communication in a master
>> and slaves model. As the software is interactive the slaves call
>> MPI_Iprobe while waiting for commands. Having compiled against the ssm
>> version would have no effect, would it?
>>
>> Regards
>> James
>>
>> Sarat Sreepathi wrote:
>>> Hello,
>>>
>>> We got a new 10-node Opteron cluster in our research group. Each node
>>> has two quad core Opterons. I installed MPICH2-1.0.8 with Pathscale(3.2)
>>> compilers and three device configurations (nemesis,ssm,sock). I built
>>> and tested using the Linpack(HPL) benchmark with ACML 4.2 BLAS library
>>> for the three different device configurations.
>>>
>>> I observed some unexpected results as the 'nemesis' configuration gave
>>> the worst performance. For the same problem parameters, the 'sock'
>>> version was faster and the 'ssm' version hangs. For further analysis, I
>>> obtained screenshots from the Ganglia monitoring tool for the three
>>> different runs. As you can see from the attached screenshots, the
>>> 'nemesis' version is consuming more 'system cpu' according to Ganglia.
>>> The 'ssm' version fares slightly better but it hangs towards the end.
>>>
>>> I may be missing something trivial here but can anyone account for this
>>> discrepancy? Isn't 'nemesis' device or 'ssm' device recommended for this
>>> cluster configuration? Your help is greatly appreciated.
>
More information about the mpich-discuss
mailing list