[mpich-discuss] MPICH2-1.0.8 performance issues on Opteron Cluster

Sarat Sreepathi sarat_s at ncsu.edu
Tue Dec 16 21:49:09 CST 2008


This is a problem with internode communication.
Each node has 8 cores. So I used the following tests with 4,8 MPI tasks.
Please note that this is a smaller problem size (N=30000) compared to
the earlier run.

HPL  	N = 30000 	N = 30000
  	sock 	nemesis
8 on one node 	4.70E+01 	*4.74E+01*
4 on one node 	2.60E+01 	2.61E+01
*8 - 4 x 2 nodes (internode)* 	4.39E+01 	*3.06E+01*


The case with 8 MPI tasks, 4 per each node (nemesis) is the one which
has significant 'System CPU' in the image below.



Thanks,
Sarat.

Darius Buntinas wrote:
> I'd like to see if the problem has to do with internode or intranode
> communication.
>
> Can you try running 10 processes on one node with sock and nemesis?
>
> If nemesis is still doing worse, please try 5 processes on one node.
>
> Thanks,
> -d
>
> On 12/16/2008 05:50 PM, Sarat Sreepathi wrote:
>   
>> Hello,
>>
>> We got a new 10-node Opteron cluster in our research group. Each node
>> has two quad core Opterons. I installed MPICH2-1.0.8 with Pathscale(3.2)
>> compilers and three device configurations (nemesis,ssm,sock). I built
>> and tested using the Linpack(HPL) benchmark with ACML 4.2 BLAS library
>> for the three different device configurations.
>>
>> I observed some unexpected results as the 'nemesis' configuration gave
>> the worst performance. For the same problem parameters, the 'sock'
>> version was faster and the 'ssm' version hangs. For further analysis, I
>> obtained screenshots from the Ganglia monitoring tool for the three
>> different runs. As you can see from the attached screenshots, the
>> 'nemesis' version is consuming more 'system cpu' according to Ganglia.
>> The 'ssm' version fares slightly better but it hangs towards the end.
>>
>> I may be missing something trivial here but can anyone account for this
>> discrepancy? Isn't 'nemesis' device or 'ssm' device recommended for this
>> cluster configuration? Your help is greatly appreciated.
>>
>> Thanks,
>> Sarat.
>>
>> _*Details:*_
>> HPL built with AMD ACML 4.2 blas libraries
>> HPL Output for a problem size N=60000
>> *nemesis - 1.653e+02 Gflops
>> ssm - hangs
>> sock - 2.029e+02 Gflops*
>>
>> c2master:~ # mpich2version
>> MPICH2 Version:         1.0.8
>> MPICH2 Release date:    Unknown, built on Fri Dec 12 16:31:15 EST 2008
>> MPICH2 Device:          ch3:nemesis
>> MPICH2 configure:       --with-device=ch3:nemesis --enable-f77
>> --enable-f90 --enable-cxx
>> --prefix=/usr/local/mpich2-1.0.8-pathscale-k8-nemesis
>> MPICH2 CC:      pathcc -march=opteron -O3
>> MPICH2 CXX:     pathCC -march=opteron -O3
>> MPICH2 F77:     pathf90 -march=opteron -O3
>> MPICH2 F90:     pathf90 -march=opteron -O3
>>
>> and similar configuration using ch3:ssm and ch3:sock devices.
>>
>> *> nohup mpiexec  -machinefile ./mf -n 80 ./xhpl < /dev/null &*
>> *Machine file used:*
>>     
>>> cat mf
>>>       
>> c2node2:8
>> c2node3:8
>> c2node4:8
>> c2node5:8
>> c2node6:8
>> c2node7:8
>> c2node8:8
>> c2node9:8
>> c2node10:8
>> c2node11:8
>>
>> c2master:~ # uname -a
>> Linux c2master 2.6.22.18-0.2-default #1 SMP 2008-06-09 13:53:20 +0200
>> x86_64 x86_64 x86_64 GNU/Linux
>> Processor: Quad-Core AMD Opteron(tm) Processor 2350 - 2 GHz
>>
>> -- 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Sarat Sreepathi
>> Doctoral Student
>> Dept. of Computer Science
>> North Carolina State University
>> sarat_s at ncsu.edu ~ (919)645-7775
>> http://www.sarats.com
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> ------------------------------------------------------------------------
>>
>>     

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sarat Sreepathi
Doctoral Student
Dept. of Computer Science
North Carolina State University
sarat_s at ncsu.edu ~ (919)645-7775
http://www.sarats.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081216/b37aa781/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: moz-screenshot-2.jpg
Type: image/jpeg
Size: 10924 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081216/b37aa781/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: moz-screenshot-3.jpg
Type: image/jpeg
Size: 8190 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081216/b37aa781/attachment-0001.jpg>


More information about the mpich-discuss mailing list