[petsc-dev] new P^1.5 algorithm in VecAssembleBegin?

Jed Brown jed at jedbrown.org
Fri May 29 14:48:24 CDT 2015

Barry Smith <bsmith at mcs.anl.gov> writes:

>> On May 29, 2015, at 2:29 PM, Jed Brown <jed at jedbrown.org> wrote:
>> Barry Smith <bsmith at mcs.anl.gov> writes:
>>>  I cannot explain why the load balance would be 1.0 unless, by
>>>  unlikely coincidence on the 248 different calls to the function
>>>  different processes are the ones waiting so that the sum of the
>>>  waits on different processes matches over the 248 calls. Possible
>>>  but
>> Uh, it's the same reason VecNorm often shows significant load imbalance.
>    Uh, I don't understand. It shows NO imbalance but huge
>    times. Normally I would expect a large imbalance and huge times. So
>    I cannot explain why it has no imbalance. 1.0 means no imbalance.

Sorry, I mixed two comments.  There are two non-scalable operations,
determining ownership for outgoing entries (the loop I showed) and the
huge MPI_Allreduce.  Our timers can never observe load imbalance in the
ownership determination because all that work is done after the timer
has started and the timer can't end until after the MPI_Allreduce.  The
MPI_Allreduce is really expensive because it involves 1-2 MB from each
of 128k cores.  (MPI_Reduce_scatter_block is much better.)

If incoming load imbalance is small relative to the ownership
determination plus MPI_Allreduce, then we see 1.0.  Putting a barrier
before (sort of) guarantees that, but if it was already the case, the
barrier won't change anything.
