Hi, <br><br>I am trying to do some timing measurements with MPI_Allreduce. I get different results from every processor, as expected. Should I use the MPI_Barrier before and after the MPI_Allreduce call and then time it (which will include the overhead of MPI_Barrier to the timing)? Or will it be accurate if I simply get the maximum of the timing results I collected form each processor.
<br><br>Thank you so much in advance,<br>Memo<br> <br>