<div dir="ltr">I am suspecting that it is catching load imbalance and just not reporting it correctly. I've added a barrier in the code.<div><br></div><div>Here are the two log files.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 28, 2015 at 7:48 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
VecAssemblyBegin() serves as a barrier unless you set the vector option VEC_IGNORE_OFF_PROC_ENTRIES so I am not surprised that it "appears" to take a lot of time. BUT the balance between the fastest and slowest is listed in your table below is 1.0 which is very surprising; indicating every process supposedly spent the same amount of time within the VecAssemblyBegin(). Note that for VecAssemblyEnd() the balance is 2.3 which is what I commonly would expect. Please send me ALL the output for -log_summary for these cases. Version of PETSc shouldn't matter for this issue.<br>
<div class="HOEnZb"><div class="h5"><br>
> On May 28, 2015, at 4:59 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
><br>
> We are seeing some large times spent in VecAssemblyBegin:<br>
><br>
> VecAssemblyBegin 242 1.0 7.9796e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.3e+02 12 0 0 0 5 76 0 0 0 10 0<br>
> VecAssemblyEnd 242 1.0 5.6624e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
><br>
> This is with 64K cores on Edison. On 128K cores (weak speedup) we see:<br>
><br>
> VecAssemblyBegin 248 1.0 2.3615e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.4e+02 17 0 0 0 4 87 0 0 0 10 0<br>
> VecAssemblyEnd 248 1.0 6.8855e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
><br>
> We are working on using older versions of PETSc to make sure this is a PETSc issue but does anyone have any thoughts on this?<br>
><br>
> Mark<br>
<br>
</div></div></blockquote></div><br></div>