<br><div class="gmail_quote">On Wed, Dec 22, 2010 at 5:40 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><br>
On Dec 22, 2010, at 9:55 AM, Yongjun Chen wrote:<br>
<br>
><br>
> Satish,<br>
><br>
> I have reconfigured the PETSC with –download-mpich=1 and –with-device=ch3:sock. The results show that the speed up can now remain increasing when computing cores increase from 1 to 16. However, the maximum speed up is still only around 6.0 with 16 cores. The new log files can be found in the attachment.<br>
><br>
><br>
> (1)<br>
><br>
> I checked the configuration of the first server again. This server is a shared-memory computer, with<br>
><br>
> Processors: 4 CPUS * 4Cores/CPU, with each core 2500MHz<br>
><br>
> Memories: 16 *2 GB DDR2 333 MHz, dual channel, data width 64 bit, so the memory Bandwidth for 2 memories is 64/8*166*2*2=5.4GB/s.<br>
<br>
</div> Wait a minute. You have 16 cores that share 5.4 GB/s???? This is not enough for iterative solvers, in fact this is absolutely terrible for iterative solvers. You really want 5.4 GB/s PER core! This machine is absolutely inappropriate for iterative solvers. No package can give you good speedups on this machine.<br>
<br>
Barry<br></blockquote><div><br><br>Barry, there are 16 memories, every 2 memories make up one dual channel, thus in this machine there are 8 dual channel, each dual channel has the memory bandwidth 5.4GB/s.<br><br>Yongjun<br>
<br><br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><br>
><br>
> It seems that each core can get 2.7GB/s memory bandwidth which can fulfill the basic requirement for sparse iterative solvers.<br>
><br>
> Is this correct? Does the shared-memory type of computer have no benefit for PETSC when the memory bandwidth is limited?<br>
><br>
><br>
> (2)<br>
><br>
> Beside, we would like to continue our work by employing a matrix partitioning / reordering algorithm, such as Metis or ParMetis, to improve the speed up performance of the program. (The current program works without any matrix decomposition.)<br>
><br>
><br>
> Matt, as you said in <a href="http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html" target="_blank">http://lists.mcs.anl.gov/pipermail/petsc-users/2007-January/001017.html</a> ,“Reordering a matrix can result in fewer iterations for an iterative solver“.<br>
><br>
> Do you think the matrix partitioning/reordering will work for this program? Or any further suggestions?<br>
><br>
><br>
> Any comments are very welcome! Thank you!<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> On Mon, Dec 20, 2010 at 11:04 PM, Satish Balay <<a href="mailto:balay@mcs.anl.gov">balay@mcs.anl.gov</a>> wrote:<br>
> On Mon, 20 Dec 2010, Yongjun Chen wrote:<br>
><br>
> > Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and<br>
> > see what I can get.<br>
><br>
> hydra is just the process manager.<br>
><br>
> Also --download-mpich uses a slightly older version - with<br>
> device=ch3:sock for portability and valgrind reasons [development]<br>
><br>
> You might want to install latest mpich manually with the defaut<br>
> device=ch3:nemsis and recheck..<br>
><br>
> satish<br>
><br>
><br>
><br>
</div>> <log_ch3sock_jacobi_bicg_4cpus.txt><log_ch3sock_jacobi_bicg_8cpus.txt><log_ch3sock_jacobi_bicg_12cpus.txt><log_ch3sock_jacobi_bicg_16cpus.txt><br>
<br>
</blockquote></div><br>