[petsc-users] log_summary time ratio and flops ratio
Xiangdong
epscodes at gmail.com
Fri Feb 5 11:32:48 CST 2016
On Fri, Feb 5, 2016 at 12:18 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Feb 5, 2016 at 11:12 AM, Xiangdong <epscodes at gmail.com> wrote:
>
>> Hello everyone,
>>
>> When I looked at the log_summary, I found that flops ratio of VecDot is
>> 1.1, but the time ratio is 1.8. Also the flops ratio of MatMult is 1.2, but
>> the time ratio is 1.6. I am using the simple SOR preconditioner on 256
>> procs. Since each of them consumes about 20% of total time, is it worth
>> looking into these imbalance issue? If yes, how do I know which procs are
>> causing the issue? What tools should I start with?
>>
>
> 1) Always send all the output
>
> 2) You have a large number of dot products, and I am guessing you have a
> slow reduce on this machine (we could see the data at the end of
> log_summary).
> The waiting time comes from slow collective communication, load
> imbalance, system noise, etc. You can see this by looking at the difference
> between
> VecDot() and VecMDot(). For the latter there is enough work to cover
> up the latency, and its imbalance is only 1.1, not 1.8. Thus I think it is
> clear that your
> problem is due to a slow communication network rather than load
> imbalance.
>
I attached the full summary. At the end, it has
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 8.3971e-05
Average time for zero size MPI_Send(): 7.16746e-06
Is it an indication of slow network?
>
> 3) Thus you would be better served with a stronger preconditioner (more
> flops) which resulted in fewer iterations (less synchronization).
>
Yes, the preconditioner I am using is simple (just sor). I am also trying
other advanced preconditioners.
>
> Thanks,
>
> Matt
>
>
>> Thanks.
>>
>> Xiangdong
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event Count Time (sec) Flops
>> --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg len
>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecView 44 1.0 3.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 4.8e+01 1 0 0 0 0 1 0 0 0 0 0
>> VecDot 817452 1.0 1.5972e+02 1.8 1.51e+10 1.1 0.0e+00 0.0e+00
>> 8.2e+05 18 3 0 0 40 18 3 0 0 40 22969
>> VecMDot 613089 1.0 5.8918e+01 1.1 2.27e+10 1.1 0.0e+00 0.0e+00
>> 6.1e+05 8 4 0 0 30 8 4 0 0 30 93397
>> VecNorm 618619 1.0 1.4426e+02 2.2 1.14e+10 1.1 0.0e+00 0.0e+00
>> 6.2e+05 15 2 0 0 30 15 2 0 0 30 19145
>> VecScale 938 1.0 8.3907e-03 1.2 8.67e+06 1.1 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 250859
>> VecCopy 8159 1.0 1.0394e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 1884 1.0 3.9701e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00 2 4 0 0 0 2 4 0 0 0 207057
>> VecAYPX 613089 1.0 6.5360e+00 1.9 1.13e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00 1 2 0 0 0 1 2 0 0 0 420336
>> VecAXPBYCZ 191 1.0 3.7365e-03 1.4 7.06e+06 1.1 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 458830
>> VecWAXPY 938 1.0 3.8502e-02 1.4 8.67e+06 1.1 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 54669
>> VecMAXPY 613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00 1 4 0 0 0 1 4 0 0 0 414499
>> VecLoad 5 1.0 2.0913e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecScatterBegin 820695 1.0 1.8959e+01 2.8 0.00e+00 0.0 1.1e+09 4.6e+03
>> 0.0e+00 2 0100100 0 2 0100100 0 0
>> VecScatterEnd 820695 1.0 5.2002e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
>> VecReduceArith 2814 1.0 1.8517e-02 1.2 5.20e+07 1.1 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 681995
>> VecReduceComm 938 1.0 2.6041e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 9.4e+02 0 0 0 0 0 0 0 0 0 0 0
>> MatMult 817452 1.0 1.9379e+02 1.6 2.04e+11 1.2 1.0e+09 4.6e+03
>> 0.0e+00 23 40100100 0 23 40100100 0 253083
>> MatSOR 818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00
>> 0.0e+00 22 40 0 0 0 22 40 0 0 0 247472
>> MatAssemblyBegin 939 1.0 7.9671e+00 6.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.9e+03 1 0 0 0 0 1 0 0 0 0 0
>> MatAssemblyEnd 939 1.0 2.0570e-01 1.2 0.00e+00 0.0 2.6e+03 1.2e+03
>> 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatZeroEntries 938 1.0 1.8202e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> SNESSolve 210 1.0 6.8585e+02 1.0 5.06e+11 1.2 1.1e+09 4.6e+03
>> 2.1e+06 98100100100100 98100100100100 178411
>> SNESFunctionEval 2296 1.0 2.0512e+01 1.3 0.00e+00 0.0 2.9e+06 3.5e+03
>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
>> SNESJacobianEval 938 1.0 2.7680e+01 1.0 0.00e+00 0.0 1.2e+06 3.5e+03
>> 1.9e+03 4 0 0 0 0 4 0 0 0 0 0
>> SNESLineSearch 938 1.0 9.2491e+00 1.0 6.07e+07 1.1 1.2e+06 3.5e+03
>> 9.4e+02 1 0 0 0 0 1 0 0 0 0 1593
>> KSPSetUp 938 1.0 2.7661e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 2.0e+01 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 938 1.0 6.3433e+02 1.0 5.06e+11 1.2 1.0e+09 4.6e+03
>> 2.0e+06 91100100100 99 91100100100 99 192852
>> PCSetUp 938 1.0 2.1005e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> PCApply 818390 1.0 1.9675e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00
>> 0.0e+00 22 40 0 0 0 22 40 0 0 0 246626
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160205/58e87ba3/attachment-0001.html>
-------------- next part --------------
Using Petsc Release Version 3.6.3, unknown
Max Max/Min Avg Total
Time (sec): 6.981e+02 1.00002 6.981e+02
Objects: 2.223e+03 1.00000 2.223e+03
Flops: 5.061e+11 1.15624 4.780e+11 1.224e+14
Flops/sec: 7.250e+08 1.15625 6.847e+08 1.753e+11
MPI Messages: 4.924e+06 2.00000 4.104e+06 1.050e+09
MPI Message Lengths: 2.343e+10 2.16478 4.601e+03 4.833e+12
MPI Reductions: 2.065e+06 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.9805e+02 100.0% 1.2236e+14 100.0% 1.050e+09 100.0% 4.601e+03 100.0% 2.065e+06 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 44 1.0 3.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+01 1 0 0 0 0 1 0 0 0 0 0
VecDot 817452 1.0 1.5972e+02 1.8 1.51e+10 1.1 0.0e+00 0.0e+00 8.2e+05 18 3 0 0 40 18 3 0 0 40 22969
VecMDot 613089 1.0 5.8918e+01 1.1 2.27e+10 1.1 0.0e+00 0.0e+00 6.1e+05 8 4 0 0 30 8 4 0 0 30 93397
VecNorm 618619 1.0 1.4426e+02 2.2 1.14e+10 1.1 0.0e+00 0.0e+00 6.2e+05 15 2 0 0 30 15 2 0 0 30 19145
VecScale 938 1.0 8.3907e-03 1.2 8.67e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 250859
VecCopy 8159 1.0 1.0394e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1884 1.0 3.9701e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 207057
VecAYPX 613089 1.0 6.5360e+00 1.9 1.13e+10 1.1 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 420336
VecAXPBYCZ 191 1.0 3.7365e-03 1.4 7.06e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 458830
VecWAXPY 938 1.0 3.8502e-02 1.4 8.67e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 54669
VecMAXPY 613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 414499
VecLoad 5 1.0 2.0913e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 820695 1.0 1.8959e+01 2.8 0.00e+00 0.0 1.1e+09 4.6e+03 0.0e+00 2 0100100 0 2 0100100 0 0
VecScatterEnd 820695 1.0 5.2002e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
VecReduceArith 2814 1.0 1.8517e-02 1.2 5.20e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681995
VecReduceComm 938 1.0 2.6041e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 9.4e+02 0 0 0 0 0 0 0 0 0 0 0
MatMult 817452 1.0 1.9379e+02 1.6 2.04e+11 1.2 1.0e+09 4.6e+03 0.0e+00 23 40100100 0 23 40100100 0 253083
MatSOR 818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 0.0e+00 22 40 0 0 0 22 40 0 0 0 247472
MatAssemblyBegin 939 1.0 7.9671e+00 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+03 1 0 0 0 0 1 0 0 0 0 0
MatAssemblyEnd 939 1.0 2.0570e-01 1.2 0.00e+00 0.0 2.6e+03 1.2e+03 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 938 1.0 1.8202e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 210 1.0 6.8585e+02 1.0 5.06e+11 1.2 1.1e+09 4.6e+03 2.1e+06 98100100100100 98100100100100 178411
SNESFunctionEval 2296 1.0 2.0512e+01 1.3 0.00e+00 0.0 2.9e+06 3.5e+03 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
SNESJacobianEval 938 1.0 2.7680e+01 1.0 0.00e+00 0.0 1.2e+06 3.5e+03 1.9e+03 4 0 0 0 0 4 0 0 0 0 0
SNESLineSearch 938 1.0 9.2491e+00 1.0 6.07e+07 1.1 1.2e+06 3.5e+03 9.4e+02 1 0 0 0 0 1 0 0 0 0 1593
KSPSetUp 938 1.0 2.7661e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 938 1.0 6.3433e+02 1.0 5.06e+11 1.2 1.0e+09 4.6e+03 2.0e+06 91100100100 99 91100100100 99 192852
PCSetUp 938 1.0 2.1005e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 818390 1.0 1.9675e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 0.0e+00 22 40 0 0 0 22 40 0 0 0 246626
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 2150 2150 161353280 0
Vector Scatter 3 2 39136 0
Matrix 3 3 1820156 0
Distributed Mesh 2 1 5040 0
Star Forest Bipartite Graph 4 2 1712 0
Discrete System 2 1 848 0
Index Set 6 6 55904 0
IS L to G Mapping 2 1 22896 0
Viewer 45 44 32032 0
SNES 1 1 1332 0
SNESLineSearch 1 1 864 0
DMSNES 1 1 664 0
Krylov Solver 1 1 1464 0
DMKSP interface 1 1 648 0
Preconditioner 1 1 1024 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 8.3971e-05
Average time for zero size MPI_Send(): 7.16746e-06
More information about the petsc-users
mailing list