<div class="gmail_quote">On Thu, Jan 5, 2012 at 09:41, TAY wee-beng <span dir="ltr">&lt;<a href="mailto:zonexo@gmail.com">zonexo@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div id=":43s">I just did a -log_summary and attach the text file, running across 8 and 16 processors. My most important concern is whether the load is balanced across the processors.<br>

<br>

In 16 processors case, for the time, it seems that the ratio for many events are higher than 1, reaching up to 6.8 for VecScatterEnd </div></blockquote><div><br></div><div>This takes about 1% of the run time and it&#39;s scaling well, so don&#39;t worry about it.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":43s">and 132.1 (?) for MatAssemblyBegin.</div></blockquote><div><br></div><div>This is about 2% of run time, but it&#39;s not scaling. Do you compute a lot of matrix entries on processes that don&#39;t own the rows?</div>

<div><br></div><div>Most of your solve time is going into PCSetUp() and PCApply, both of which are getting more expensive as you add processes. These are more than 10x more than spent in MatMult() and MatMult() takes slightly less time on more processes, so the increase isn&#39;t entirely due to memory issues.</div>

<div><br></div><div>What methods are you using?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":43s"> However, for the flops, ratios are 1 and 1.1. so which is more important to look at? time or flops?</div>

</blockquote></div><br><div>If you would rather do a lot of flops than solve the problem in a reasonable amount of time, you might as well use dense methods. ;-)</div>