<div dir="ltr">Thanks, this code does not have any communication in VecAssembly so I added the INGNORE stuff and added a barrier before this section of code.  I am suspecting that VecAssembly is catching load imbalance but not reporting it for some reason.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 29, 2015 at 10:54 AM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> writes:<br>

<br>

> I am suspecting that it is catching load imbalance and just not reporting<br>

> it correctly. I've added a barrier in the code.<br>

><br>

> Here are the two log files.<br>

<br>

</span>Mark, there has always been a worst-case O(n*p) algorithm in<br>

VecStashScatterBegin_Private:<br>

<br>

  for (i=0; i<stash->n; i++) {<br>

    /* if indices are NOT locally sorted, need to start search at the beginning */<br>

    if (lastidx > (idx = stash->idx[i])) j = 0;<br>

    lastidx = idx;<br>

    for (; j<size; j++) {<br>

      if (idx >= owners[j] && idx < owners[j+1]) {<br>

        nprocs[2*j]++; nprocs[2*j+1] = 1; owner[i] = j; break;<br>

      }<br>

    }<br>

  }<br>

<br>

The branch jed/mat-assembly-perf has a scalable implementation.  Can you<br>

try it (either in that branch or in 'next')?<br>

</blockquote></div><br></div>