<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>Hello,<br>
<br>
We got a new 10-node Opteron cluster in our research group. Each node
has two quad core Opterons. I installed MPICH2-1.0.8 with
Pathscale(3.2) compilers and three device configurations
(nemesis,ssm,sock). I built and tested using the Linpack(HPL) benchmark
with ACML 4.2 BLAS library for the three different device
configurations.<br>
<br>
I observed some unexpected results as the 'nemesis' configuration gave
the worst performance. For the same problem parameters, the 'sock'
version was faster and the 'ssm' version hangs. For further analysis, I
obtained screenshots from the Ganglia monitoring tool for the three
different runs. As you can see from the attached screenshots, the
'nemesis' version is consuming more 'system cpu' according to Ganglia.
The 'ssm' version fares slightly better but it hangs towards the end.<br>
<br>
I may be missing something trivial here but can anyone account for this
discrepancy? Isn't 'nemesis' device or 'ssm' device recommended for
this cluster configuration? Your help is greatly appreciated.<br>
</tt><tt><br>
Thanks,<br>
Sarat.</tt><br>
<tt><br>
<u><b>Details:</b></u><br>
HPL built with AMD ACML 4.2 blas libraries<br>
HPL Output for a problem size N=60000<br>
<b>nemesis - 1.653e+02 Gflops<br>
ssm - hangs<br>
sock - 2.029e+02 Gflops</b><br>
<br>
</tt><tt>c2master:~ # mpich2version <br>
MPICH2 Version: 1.0.8<br>
MPICH2 Release date: Unknown, built on Fri Dec 12 16:31:15 EST 2008<br>
MPICH2 Device: ch3:nemesis<br>
MPICH2 configure: --with-device=ch3:nemesis --enable-f77
--enable-f90 --enable-cxx
--prefix=/usr/local/mpich2-1.0.8-pathscale-k8-nemesis<br>
MPICH2 CC: pathcc -march=opteron -O3 <br>
MPICH2 CXX: pathCC -march=opteron -O3 <br>
MPICH2 F77: pathf90 -march=opteron -O3 <br>
MPICH2 F90: pathf90 -march=opteron -O3 <br>
<br>
and similar configuration using ch3:ssm and ch3:sock devices.<br>
<br>
<b>> nohup mpiexec -machinefile ./mf -n 80 ./xhpl < /dev/null
&</b><br>
<b>Machine file used:</b><br>
> cat mf<br>
c2node2:8<br>
c2node3:8<br>
c2node4:8<br>
c2node5:8<br>
c2node6:8<br>
c2node7:8<br>
c2node8:8<br>
c2node9:8<br>
c2node10:8<br>
c2node11:8<br>
<br>
c2master:~ # uname -a<br>
Linux c2master 2.6.22.18-0.2-default #1 SMP 2008-06-09 13:53:20 +0200
x86_64 x86_64 x86_64 GNU/Linux<br>
Processor: Quad-Core AMD Opteron(tm) Processor 2350 - 2 GHz<br>
</tt><tt></tt><br>
<pre class="moz-signature" cols="72">--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sarat Sreepathi
Doctoral Student
Dept. of Computer Science
North Carolina State University
<a class="moz-txt-link-abbreviated" href="mailto:sarat_s@ncsu.edu">sarat_s@ncsu.edu</a> ~ (919)645-7775
<a class="moz-txt-link-freetext" href="http://www.sarats.com">http://www.sarats.com</a>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
</pre>
</body>
</html>