[petsc-users] Time cost by Vec Assembly

Barry Smith bsmith at mcs.anl.gov
Sat Oct 8 12:20:11 CDT 2016


> On Oct 7, 2016, at 11:30 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Barry Smith <bsmith at mcs.anl.gov> writes:
>>    There is still something wonky here, whether it is the MPI implementation or how PETSc handles the assembly. Without any values that need to be communicated it is unacceptably that these calls take so long. If we understood __exactly__ why the performance suddenly drops so dramatically we could perhaps fix it. I do not understand why.
> 
> I guess it's worth timing.  If they don't have MPI_Reduce_scatter_block
> then it falls back to a big MPI_Allreduce.  After that, it's all
> point-to-point messaging that shouldn't suck and there actually
> shouldn't be anything to send or receive anyway.  The BTS implementation
> should be much smarter and literally reduces to a barrier in this case.

   Could it be that the length of the data (in the 64k processor case) is now larger than the "eager" limit so instead of just sending all the data in the message up the tree it sends some of the data and waits for confirmation before sending more data leading to a really bad state? Perhaps there is some MPI environmental variables that could be tuned.



More information about the petsc-users mailing list