[petsc-users] MPI Communication times
Manuel Valera
mvalera-w at sdsu.edu
Fri Mar 22 16:54:02 CDT 2019
No, is the same problem running with different number of processors, i have
data from 1 to 20 processors in increments of 20 processors/1 node, and
additionally for 1 processor.
On Fri, Mar 22, 2019 at 2:48 PM Zhang, Junchao <jczhang at mcs.anl.gov> wrote:
> Did you change problem size with different runs?
>
> On Fri, Mar 22, 2019 at 4:09 PM Manuel Valera <mvalera-w at sdsu.edu> wrote:
>
>> Hello,
>>
>> I repeated the timings with the -log_sync option and now i get for 200
>> processors / 20 nodes:
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event Count Time (sec) Flop
>> --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio
>> Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> VecScatterBarrie 3014 1.0 5.6771e+01 3.9 0.00e+00 0.0 *0.0e+00
>> 0.0e+00 0.0e+00 5 * 0 0 0 0 5 0 0 0 0 0
>> VecScatterBegin 3014 1.0 3.1684e+01 2.0 0.00e+00 0.0 *4.2e+06 1.1e+06
>> 2.8e+01 4* 0 63 56 0 4 0 63 56 0 0
>> VecScatterEnd 2976 1.0 1.1383e+02 1.8 0.00e+00 0.0 *0.0e+00
>> 0.0e+00 0.0e+00 14* 0 0 0 0 14 0 0 0 0 0
>>
>> With 100 processors / 10 nodes:
>>
>> VecScatterBarrie 3010 1.0 7.4430e+01 5.0 0.00e+00 0.0 *0.0e+00
>> 0.0e+00 0.0e+00 7* 0 0 0 0 7 0 0 0 0 0
>> VecScatterBegin 3010 1.0 3.8504e+01 2.4 0.00e+00 0.0 *1.6e+06 2.0e+06
>> 2.8e+01 4 * 0 71 66 0 4 0 71 66 0 0
>> VecScatterEnd 2972 1.0 8.5158e+01 1.2 0.00e+00 0.0 *0.0e+00
>> 0.0e+00 0.0e+00 9 * 0 0 0 0 9 0 0 0 0 0
>>
>> And with 20 processors / 1 node:
>>
>> VecScatterBarrie 2596 1.0 4.0614e+01 7.3 0.00e+00 0.0* 0.0e+00
>> 0.0e+00 0.0e+00 4 * 0 0 0 0 4 0 0 0 0 0
>> VecScatterBegin 2596 1.0 1.4970e+01 1.3 0.00e+00 0.0 *1.2e+05
>> 4.0e+06 3.0e+01 1 * 0 81 61 0 1 0 81 61 0 0
>> VecScatterEnd 2558 1.0 1.4903e+01 1.3 0.00e+00 0.0* 0.0e+00
>> 0.0e+00 0.0e+00 1 * 0 0 0 0 1 0 0 0 0 0
>>
>> Can you help me interpret this? what i see is the End portion taking more
>> relative time and Begin staying the same beyond one node, also Barrier and
>> Begin counts are the same every time, but how do i estimate communication
>> times from here?
>>
>> Thanks,
>>
>>
>> On Wed, Mar 20, 2019 at 3:24 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>> wrote:
>>
>>> Forgot to mention long VecScatter time might also due to local memory
>>> copies. If the communication pattern has large local to local (self to
>>> self) scatter, which often happens thanks to locality, then the memory
>>> copy time is counted in VecScatter. You can analyze your code's
>>> communication pattern to see if it is the case.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Mar 20, 2019 at 4:44 PM Zhang, Junchao via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera <mvalera-w at sdsu.edu>
>>>> wrote:
>>>>
>>>>> Thanks for your answer, so for example i have a log for 200 cores
>>>>> across 10 nodes that reads:
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> Event Count Time (sec) Flop
>>>>> --- Global --- --- Stage --- Total
>>>>> Max Ratio Max Ratio Max Ratio
>>>>> Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------
>>>>> VecScatterBegin 3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 *4.2e+06
>>>>> 1.1e+06 2.8e+01 4* 0 63 56 0 4 0 63 56 0 0
>>>>> VecScatterEnd 2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 *0.0e+00
>>>>> 0.0e+00 0.0e+00 14* 0 0 0 0 14 0 0 0 0 0
>>>>>
>>>>> While for 20 nodes at one node i have:
>>>>>
>>>> What does that mean?
>>>>
>>>>> VecScatterBegin 2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 *1.2e+05
>>>>> 4.0e+06 3.0e+01 2* 0 81 61 0 2 0 81 61 0 0
>>>>> VecScatterEnd 2558 1.0 8.0344e+01 7.9 0.00e+00 0.0 *0.0e+00
>>>>> 0.0e+00 0.0e+00 3* 0 0 0 0 3 0 0 0 0 0
>>>>>
>>>>> Where do i see the max/min ratio in here? and why End step is all
>>>>> 0.0e00 in both but still grows from 3% to 14% of total time? It seems i
>>>>> would need to run again with the -log_sync option, is this correct?
>>>>>
>>>>> e.g., 2.1, 7.9. MPI send/recv are in VecScatterBegin().
>>>> VecScatterEnd() only does MPI_Wait. That is why it has zero messages. Yes,
>>>> run with -log_sync and see what happens.
>>>>
>>>>
>>>>> Different question, can't i estimate the total communication time if i
>>>>> had a typical communication time per MPI message times the number of MPI
>>>>> messages reported in the log? or it doesn't work like that?
>>>>>
>>>>> Probably not work because you have multiple processes doing send/recv
>>>> at the same time. They might saturate the bandwidth. Petsc also does
>>>> computation/communication overlapping.
>>>>
>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 20, 2019 at 2:02 PM Zhang, Junchao <jczhang at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> See the "Mess AvgLen Reduct" number in each log stage. Mess is
>>>>>> the total number of messages sent in an event over all processes. AvgLen
>>>>>> is average message len. Reduct is the number of global reduction.
>>>>>> Each event like VecScatterBegin/End has a maximal execution time over
>>>>>> all processes, and a max/min ratio. %T is sum(execution time of the event
>>>>>> on each process)/sum(execution time of the stage on each process). %T
>>>>>> indicates how expensive the event is. It is a number you should pay
>>>>>> attention to.
>>>>>> If your code is imbalanced (i.e., with a big max/min ratio), then the
>>>>>> performance number is skewed and becomes misleading because some processes
>>>>>> are just waiting for others. Then, besides -log_view, you can add
>>>>>> -log_sync, which adds an extra MPI_Barrier for each event to let them start
>>>>>> at the same time. With that, it is easier to interpret the number.
>>>>>> src/vec/vscat/examples/ex4.c is a tiny example for VecScatter
>>>>>> logging.
>>>>>>
>>>>>> --Junchao Zhang
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 20, 2019 at 2:58 PM Manuel Valera via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am working on timing my model, which we made MPI scalable using
>>>>>>> petsc DMDAs, i want to know more about the output log and how to calculate
>>>>>>> a total communication times for my runs, so far i see we have "MPI
>>>>>>> Messages" and "MPI Messages Lengths" in the log, along VecScatterEnd and
>>>>>>> VecScatterBegin reports.
>>>>>>>
>>>>>>> My question is, how do i interpret these number to get a rough
>>>>>>> estimate on how much overhead we have just from MPI communications times in
>>>>>>> my model runs?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190322/a62aa121/attachment.html>
More information about the petsc-users
mailing list