[petsc-users] log_summary time ratio and flops ratio

Xiangdong epscodes at gmail.com
Fri Feb 5 11:32:48 CST 2016


On Fri, Feb 5, 2016 at 12:18 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Feb 5, 2016 at 11:12 AM, Xiangdong <epscodes at gmail.com> wrote:
>
>> Hello everyone,
>>
>> When I looked at the log_summary, I found that flops ratio of VecDot is
>> 1.1, but the time ratio is 1.8. Also the flops ratio of MatMult is 1.2, but
>> the time ratio is 1.6. I am using the simple SOR preconditioner on 256
>> procs. Since each of them consumes about 20% of total time, is it worth
>> looking into these imbalance issue? If yes, how do I know which procs are
>> causing the issue? What tools should I start with?
>>
>
> 1) Always send all the output
>
> 2) You have a large number of dot products, and I am guessing you have a
> slow reduce on this machine (we could see the data at the end of
> log_summary).
>      The waiting time comes from slow collective communication, load
> imbalance, system noise, etc. You can see this by looking at the difference
> between
>      VecDot() and VecMDot(). For the latter there is enough work to cover
> up the latency, and its imbalance is only 1.1, not 1.8. Thus I think it is
> clear that your
>      problem is due to a slow communication network rather than load
> imbalance.
>

I attached the full summary. At the end, it has

Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 8.3971e-05
Average time for zero size MPI_Send(): 7.16746e-06

Is it an indication of slow network?

>
> 3) Thus you would be better served with a stronger preconditioner (more
> flops) which resulted in fewer iterations (less synchronization).
>

Yes, the preconditioner I am using is simple (just sor). I am also trying
other advanced preconditioners.

>
>   Thanks,
>
>      Matt
>
>
>> Thanks.
>>
>> Xiangdong
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flops
>>         --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecView               44 1.0 3.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 4.8e+01  1  0  0  0  0   1  0  0  0  0     0
>> VecDot            817452 1.0 1.5972e+02 1.8 1.51e+10 1.1 0.0e+00 0.0e+00
>> 8.2e+05 18  3  0  0 40  18  3  0  0 40 22969
>> VecMDot           613089 1.0 5.8918e+01 1.1 2.27e+10 1.1 0.0e+00 0.0e+00
>> 6.1e+05  8  4  0  0 30   8  4  0  0 30 93397
>> VecNorm           618619 1.0 1.4426e+02 2.2 1.14e+10 1.1 0.0e+00 0.0e+00
>> 6.2e+05 15  2  0  0 30  15  2  0  0 30 19145
>> VecScale             938 1.0 8.3907e-03 1.2 8.67e+06 1.1 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 250859
>> VecCopy             8159 1.0 1.0394e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              1884 1.0 3.9701e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY          1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00  2  4  0  0  0   2  4  0  0  0 207057
>> VecAYPX           613089 1.0 6.5360e+00 1.9 1.13e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00  1  2  0  0  0   1  2  0  0  0 420336
>> VecAXPBYCZ           191 1.0 3.7365e-03 1.4 7.06e+06 1.1 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 458830
>> VecWAXPY             938 1.0 3.8502e-02 1.4 8.67e+06 1.1 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 54669
>> VecMAXPY          613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00  1  4  0  0  0   1  4  0  0  0 414499
>> VecLoad                5 1.0 2.0913e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin   820695 1.0 1.8959e+01 2.8 0.00e+00 0.0 1.1e+09 4.6e+03
>> 0.0e+00  2  0100100  0   2  0100100  0     0
>> VecScatterEnd     820695 1.0 5.2002e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>> VecReduceArith      2814 1.0 1.8517e-02 1.2 5.20e+07 1.1 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 681995
>> VecReduceComm        938 1.0 2.6041e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 9.4e+02  0  0  0  0  0   0  0  0  0  0     0
>> MatMult           817452 1.0 1.9379e+02 1.6 2.04e+11 1.2 1.0e+09 4.6e+03
>> 0.0e+00 23 40100100  0  23 40100100  0 253083
>> MatSOR            818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00
>> 0.0e+00 22 40  0  0  0  22 40  0  0  0 247472
>> MatAssemblyBegin     939 1.0 7.9671e+00 6.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.9e+03  1  0  0  0  0   1  0  0  0  0     0
>> MatAssemblyEnd       939 1.0 2.0570e-01 1.2 0.00e+00 0.0 2.6e+03 1.2e+03
>> 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatZeroEntries       938 1.0 1.8202e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> SNESSolve            210 1.0 6.8585e+02 1.0 5.06e+11 1.2 1.1e+09 4.6e+03
>> 2.1e+06 98100100100100  98100100100100 178411
>> SNESFunctionEval    2296 1.0 2.0512e+01 1.3 0.00e+00 0.0 2.9e+06 3.5e+03
>> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> SNESJacobianEval     938 1.0 2.7680e+01 1.0 0.00e+00 0.0 1.2e+06 3.5e+03
>> 1.9e+03  4  0  0  0  0   4  0  0  0  0     0
>> SNESLineSearch       938 1.0 9.2491e+00 1.0 6.07e+07 1.1 1.2e+06 3.5e+03
>> 9.4e+02  1  0  0  0  0   1  0  0  0  0  1593
>> KSPSetUp             938 1.0 2.7661e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 2.0e+01  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve             938 1.0 6.3433e+02 1.0 5.06e+11 1.2 1.0e+09 4.6e+03
>> 2.0e+06 91100100100 99  91100100100 99 192852
>> PCSetUp              938 1.0 2.1005e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply           818390 1.0 1.9675e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00
>> 0.0e+00 22 40  0  0  0  22 40  0  0  0 246626
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160205/58e87ba3/attachment-0001.html>
-------------- next part --------------
Using Petsc Release Version 3.6.3, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           6.981e+02      1.00002   6.981e+02
Objects:              2.223e+03      1.00000   2.223e+03
Flops:                5.061e+11      1.15624   4.780e+11  1.224e+14
Flops/sec:            7.250e+08      1.15625   6.847e+08  1.753e+11
MPI Messages:         4.924e+06      2.00000   4.104e+06  1.050e+09
MPI Message Lengths:  2.343e+10      2.16478   4.601e+03  4.833e+12
MPI Reductions:       2.065e+06      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.9805e+02 100.0%  1.2236e+14 100.0%  1.050e+09 100.0%  4.601e+03      100.0%  2.065e+06 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView               44 1.0 3.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+01  1  0  0  0  0   1  0  0  0  0     0
VecDot            817452 1.0 1.5972e+02 1.8 1.51e+10 1.1 0.0e+00 0.0e+00 8.2e+05 18  3  0  0 40  18  3  0  0 40 22969
VecMDot           613089 1.0 5.8918e+01 1.1 2.27e+10 1.1 0.0e+00 0.0e+00 6.1e+05  8  4  0  0 30   8  4  0  0 30 93397
VecNorm           618619 1.0 1.4426e+02 2.2 1.14e+10 1.1 0.0e+00 0.0e+00 6.2e+05 15  2  0  0 30  15  2  0  0 30 19145
VecScale             938 1.0 8.3907e-03 1.2 8.67e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 250859
VecCopy             8159 1.0 1.0394e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1884 1.0 3.9701e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY          1021815 1.0 2.2148e+01 2.1 1.89e+10 1.1 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0 207057
VecAYPX           613089 1.0 6.5360e+00 1.9 1.13e+10 1.1 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 420336
VecAXPBYCZ           191 1.0 3.7365e-03 1.4 7.06e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 458830
VecWAXPY             938 1.0 3.8502e-02 1.4 8.67e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 54669
VecMAXPY          613089 1.0 1.3276e+01 2.2 2.27e+10 1.1 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0 414499
VecLoad                5 1.0 2.0913e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin   820695 1.0 1.8959e+01 2.8 0.00e+00 0.0 1.1e+09 4.6e+03 0.0e+00  2  0100100  0   2  0100100  0     0
VecScatterEnd     820695 1.0 5.2002e+01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
VecReduceArith      2814 1.0 1.8517e-02 1.2 5.20e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 681995
VecReduceComm        938 1.0 2.6041e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 9.4e+02  0  0  0  0  0   0  0  0  0  0     0
MatMult           817452 1.0 1.9379e+02 1.6 2.04e+11 1.2 1.0e+09 4.6e+03 0.0e+00 23 40100100  0  23 40100100  0 253083
MatSOR            818390 1.0 1.9608e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 0.0e+00 22 40  0  0  0  22 40  0  0  0 247472
MatAssemblyBegin     939 1.0 7.9671e+00 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+03  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyEnd       939 1.0 2.0570e-01 1.2 0.00e+00 0.0 2.6e+03 1.2e+03 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries       938 1.0 1.8202e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESSolve            210 1.0 6.8585e+02 1.0 5.06e+11 1.2 1.1e+09 4.6e+03 2.1e+06 98100100100100  98100100100100 178411
SNESFunctionEval    2296 1.0 2.0512e+01 1.3 0.00e+00 0.0 2.9e+06 3.5e+03 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
SNESJacobianEval     938 1.0 2.7680e+01 1.0 0.00e+00 0.0 1.2e+06 3.5e+03 1.9e+03  4  0  0  0  0   4  0  0  0  0     0
SNESLineSearch       938 1.0 9.2491e+00 1.0 6.07e+07 1.1 1.2e+06 3.5e+03 9.4e+02  1  0  0  0  0   1  0  0  0  0  1593
KSPSetUp             938 1.0 2.7661e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             938 1.0 6.3433e+02 1.0 5.06e+11 1.2 1.0e+09 4.6e+03 2.0e+06 91100100100 99  91100100100 99 192852
PCSetUp              938 1.0 2.1005e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply           818390 1.0 1.9675e+02 1.5 2.00e+11 1.1 0.0e+00 0.0e+00 0.0e+00 22 40  0  0  0  22 40  0  0  0 246626
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector  2150           2150    161353280     0
      Vector Scatter     3              2        39136     0
              Matrix     3              3      1820156     0
    Distributed Mesh     2              1         5040     0
Star Forest Bipartite Graph     4              2         1712     0
     Discrete System     2              1          848     0
           Index Set     6              6        55904     0
   IS L to G Mapping     2              1        22896     0
              Viewer    45             44        32032     0
                SNES     1              1         1332     0
      SNESLineSearch     1              1          864     0
              DMSNES     1              1          664     0
       Krylov Solver     1              1         1464     0
     DMKSP interface     1              1          648     0
      Preconditioner     1              1         1024     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 8.3971e-05
Average time for zero size MPI_Send(): 7.16746e-06


More information about the petsc-users mailing list