[petsc-users] MPI Communication times

Fri Mar 22 16:08:26 CDT 2019

Hello,

I repeated the timings with the -log_sync option and now i get for 200
processors / 20 nodes:

------------------------------------------------------------------------------------------------------------------------
Event                         Count      Time (sec)     Flop
             --- Global ---  --- Stage ---   Total
                               Max Ratio  Max     Ratio   Max  Ratio  Mess
 Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

VecScatterBarrie    3014 1.0 5.6771e+01 3.9 0.00e+00 0.0 *0.0e+00 0.0e+00
0.0e+00  5 * 0  0  0  0   5  0  0  0  0     0
VecScatterBegin    3014 1.0 3.1684e+01 2.0 0.00e+00 0.0 *4.2e+06 1.1e+06
2.8e+01  4*  0 63 56  0   4  0 63 56  0     0
VecScatterEnd       2976 1.0 1.1383e+02 1.8 0.00e+00 0.0 *0.0e+00 0.0e+00
0.0e+00 14*  0  0  0  0  14  0  0  0  0     0

With 100 processors / 10 nodes:

VecScatterBarrie    3010 1.0 7.4430e+01 5.0 0.00e+00 0.0 *0.0e+00 0.0e+00
0.0e+00  7*  0  0  0  0   7  0  0  0  0     0
VecScatterBegin    3010 1.0 3.8504e+01 2.4 0.00e+00 0.0 *1.6e+06 2.0e+06
2.8e+01  4 * 0 71 66  0   4  0 71 66  0     0
VecScatterEnd       2972 1.0 8.5158e+01 1.2 0.00e+00 0.0 *0.0e+00 0.0e+00
0.0e+00  9 * 0  0  0  0   9  0  0  0  0     0

And with 20 processors / 1 node:

VecScatterBarrie    2596 1.0 4.0614e+01 7.3 0.00e+00 0.0*  0.0e+00 0.0e+00
0.0e+00  4 * 0  0  0  0   4  0  0  0  0     0
VecScatterBegin     2596 1.0 1.4970e+01 1.3 0.00e+00 0.0 *1.2e+05 4.0e+06
3.0e+01  1 * 0 81 61  0   1  0 81 61  0     0
VecScatterEnd       2558 1.0 1.4903e+01 1.3 0.00e+00 0.0*  0.0e+00 0.0e+00
0.0e+00  1 * 0  0  0  0   1  0  0  0  0     0

Can you help me interpret this? what i see is the End portion taking more
relative time and Begin staying the same beyond one node, also Barrier and
Begin counts are the same every time, but how do i estimate communication
times from here?

Thanks,

On Wed, Mar 20, 2019 at 3:24 PM Zhang, Junchao <jczhang at mcs.anl.gov> wrote:

> Forgot to mention long VecScatter time might also due to local memory
> copies. If the communication pattern has large local to local (self to
> self)  scatter, which often happens thanks to locality, then the memory
> copy time is counted in VecScatter. You can analyze your code's
> communication pattern to see if it is the case.
>
> --Junchao Zhang
>
>
> On Wed, Mar 20, 2019 at 4:44 PM Zhang, Junchao via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>>
>>
>> On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera <mvalera-w at sdsu.edu> wrote:
>>
>>> Thanks for your answer, so for example i have a log for 200 cores across
>>> 10 nodes that reads:
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                           Count      Time (sec)     Flop
>>>                    --- Global ---  --- Stage ---   Total
>>>                                 Max Ratio  Max     Ratio   Max  Ratio
>>> Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>
>>> ----------------------------------------------------------------------------------------------------------------------
>>> VecScatterBegin     3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 *4.2e+06
>>> 1.1e+06 2.8e+01  4*  0 63 56  0   4  0 63 56  0     0
>>> VecScatterEnd       2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 *0.0e+00
>>> 0.0e+00 0.0e+00 14*  0  0  0  0  14  0  0  0  0     0
>>>
>>> While for 20 nodes at one node i have:
>>>
>>  What does that mean?
>>
>>> VecScatterBegin     2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 *1.2e+05
>>> 4.0e+06 3.0e+01  2*  0 81 61  0   2  0 81 61  0     0
>>> VecScatterEnd       2558 1.0 8.0344e+01 7.9 0.00e+00 0.0  *0.0e+00
>>> 0.0e+00 0.0e+00  3*  0  0  0  0   3  0  0  0  0     0
>>>
>>> Where do i see the max/min ratio in here? and why End step is all 0.0e00
>>> in both but still grows from 3% to 14% of total time? It seems i would need
>>> to run again with the -log_sync option, is this correct?
>>>
>>> e.g., 2.1, 7.9. MPI send/recv are in VecScatterBegin(). VecScatterEnd()
>> only does MPI_Wait. That is why it has zero messages. Yes, run with
>> -log_sync and see what happens.
>>
>>
>>> Different question, can't i estimate the total communication time if i
>>> had a typical communication time per MPI message times the number of MPI
>>> messages reported in the log? or it doesn't work like that?
>>>
>>> Probably not work because you have multiple processes doing send/recv at
>> the same time. They might saturate the bandwidth. Petsc also does
>> computation/communication overlapping.
>>
>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 20, 2019 at 2:02 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>>> wrote:
>>>
>>>> See the "Mess   AvgLen  Reduct" number in each log stage.  Mess is the
>>>> total number of messages sent in an event over all processes.  AvgLen is
>>>> average message len. Reduct is the number of global reduction.
>>>> Each event like VecScatterBegin/End has a maximal execution time over
>>>> all processes, and a max/min ratio.  %T is sum(execution time of the event
>>>> on each process)/sum(execution time of the stage on each process). %T
>>>> indicates how expensive the event is. It is a number you should pay
>>>> attention to.
>>>> If your code is imbalanced (i.e., with a big max/min ratio), then the
>>>> performance number is skewed and becomes misleading because some processes
>>>> are just waiting for others. Then, besides -log_view, you can add
>>>> -log_sync, which adds an extra MPI_Barrier for each event to let them start
>>>> at the same time. With that, it is easier to interpret the number.
>>>> src/vec/vscat/examples/ex4.c is a tiny example for VecScatter logging.
>>>>
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Wed, Mar 20, 2019 at 2:58 PM Manuel Valera via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am working on timing my model, which we made MPI scalable using
>>>>> petsc DMDAs, i want to know more about the output log and how to calculate
>>>>> a total communication times for my runs, so far i see we have "MPI
>>>>> Messages" and "MPI Messages Lengths" in the log, along VecScatterEnd and
>>>>> VecScatterBegin reports.
>>>>>
>>>>> My question is, how do i interpret these number to get a rough
>>>>> estimate on how much overhead we have just from MPI communications times in
>>>>> my model runs?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190322/7e4844a0/attachment-0001.html>