Hello all, specially Dr. Matt, <br><br>
<div><span class="gmail_quote">On 4/16/08, <b class="gmail_sendername">Matthew Knepley</b> <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie <<a href="mailto:rlmackie862@gmail.com">rlmackie862@gmail.com</a>> wrote:<br>
> I'm running my PETSc code on a cluster of quad core Xeon's connected<br>> by Infiniband. I hadn't much worried about the performance, because<br>> everything seemed to be working quite well, but today I was actually<br>
> comparing performance (wall clock time) for the same problem, but on<br>> different combinations of CPUS.<br>><br>> I find that my PETSc code is quite scalable until I start to use<br>> multiple cores/cpu.<br>
><br>> For example, the run time doesn't improve by going from 1 core/cpu<br>> to 4 cores/cpu, and I find this to be very strange, especially since<br>> looking at top or Ganglia, all 4 cpus on each node are running at 100%<br>
> almost<br>> all of the time. I would have thought if the cpus were going all out,<br>> that I would still be getting much more scalable results.<br><br>Those a really coarse measures. There is absolutely no way that all cores<br>
are going 100%. Its easy to show by hand. Take the peak flop rate and<br>this gives you the bandwidth needed to sustain that computation (if<br>everything is perfect, like axpy). You will find that the chip bandwidth<br>is far below this. A nice analysis is in<br>
<br><a href="http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf">http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf</a><br><br>> We are using mvapich-0.9.9 with infiniband. So, I don't know if<br>> this is a cluster/Xeon issue, or something else.<br>
<br>This is actually mathematics! How satisfying. The only way to improve<br>this is to change the data structure (e.g. use blocks) or change the<br>algorithm (e.g. use spectral elements and unassembled structures)</blockquote>
<div> </div>
<div>Would you please explain a bit about "unassembled structures"? </div>
<div>Does Discontinuous Galerkin Method falls into this category?</div>
<div> </div>
<div>Thanks and Regrads,</div>
<div>Amjad Ali.</div><br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Matt<br><br>> Anybody with experience on this?<br>><br>> Thanks, Randy M.<br>><br>><br><br>
<br><br>--<br>What most experimenters take for granted before they begin their<br>experiments is infinitely more interesting than any results to which<br>their experiments lead.<br>-- Norbert Wiener<br><br></blockquote></div>
<br>