<div dir="ltr">Yes, I'm looking at weak scalability right now. I'm using BiCGSTAB with BoomerAMG (all default options except for rtol = 1e-12). I've not looked into MF/s yet but I'll surely do to see if I'm having any problem there. So far, just timing the KSPSolve, I get [0.231, 0.238, 0.296, 0.451, 0.599] seconds/KSP iteration for p=[1, 4, 16, 64, 256] with almost 93K nodes (matrix-row) per proc. Which is not bad I guess but still increased by a factor of 3 for 256 proc. Problem is, I don't know how good/bad this is. In fact I'm not even sure that is a valid question to ask since it may be very problem dependent.<div>
<br></div><div>Something I just though about, how crucial is the matrix structure for KSP solvers? The nodes have bad numbering and I do partitioning to get a better one here. <br><div><div><br></div><div><br><br><div class="gmail_quote">
On Fri, May 18, 2012 at 4:47 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Fri, May 18, 2012 at 7:43 PM, Mohammad Mirzadeh <span dir="ltr"><<a href="mailto:mirzadeh@gmail.com" target="_blank">mirzadeh@gmail.com</a>></span> wrote:<br></div><div class="gmail_quote"><div class="im">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">I see; well that's a fair point. So i have my timing results obtained via -log_summary; what should I be looking into for MatMult? Should I be looking at wall timings? Or do I need to look into MFlops/s? I'm sorry but I'm not sure what measure I should be looking into to determine scalability.</div>
</blockquote><div><br></div></div><div>Time is only meaningful in isolation if I know how big your matrix is, but you obviously take the ratio to look how it is scaling. I am</div><div>assuming you are looking at weak scalability so it should remain constant. MF/s will let you know how the routine is performing</div>
<div>independent of size, and thus is an easy way to see what is happening. It should scale like P, and when that drops off you have</div><div>insufficient bandwidth. VecMDot is a good way to look at the latency of reductions (assuming you use GMRES). There is indeed no</div>
<div>good guide to this. Barry should write one.</div><div><br></div><div> Matt</div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<div>
Also, is there any general meaningful advice one could give? in terms of using the resources, compiler flags (beyond -O3), etc?</div><div><br></div><div>Thanks,</div><div>Mohammad<br><br><div class="gmail_quote">
On Fri, May 18, 2012 at 4:18 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div>On Fri, May 18, 2012 at 7:06 PM, Mohammad Mirzadeh <span dir="ltr"><<a href="mailto:mirzadeh@gmail.com" target="_blank">mirzadeh@gmail.com</a>></span> wrote:<br></div></div><div class="gmail_quote">
<div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi guys,<div><br></div><div>I'm trying to generate scalability plots for my code and do profiling and fine tuning. In doing so I have noticed that some of the factors affecting my results are sort of subtle. For example, I figured, the other day, that using all of the cores on a single node is somewhat (50-60%) slower when compared to using only half of the cores which I suspect is due to memory bandwidth and/or other hardware-related issues. </div>
<div><br></div><div>So I thought to ask and see if there is any example in petsc that has been tested for scalability and has been documented? Basically I want to use this test example as a benchmark to compare my results with. My own test code is currently a linear Poisson solver on an adaptive quadtree grid and involves non-trivial geometry (well basically a circle for the boundary but still not a simple box).</div>
</div></blockquote><div><br></div></div></div><div>Unfortunately, I do not even know what that means. We can't guarantee a certain level of performance because it not</div><div>only depends on the hardware, but how you use it (as evident in your case). In a perfect world, we would have an abstract</div>
<div>model of the computation (available for MatMult) and your machine (not available anywhere) and we would automatically</div><div>work out the consequences and tell you what to expect. Instead today, we tell you to look at a few key indicators like the</div>
<div>MatMult event, to see what is going on. When MatMult stops scaling, you have run out of bandwidth.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div>Thanks,</div><div>Mohammad</div></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br><br clear="all"><span><font color="#888888"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</font></span></font></span></blockquote></div><br></div></div>
</blockquote></div></div><div class="HOEnZb"><div class="h5"><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</div></div></blockquote></div><br></div></div></div></div>