[petsc-users] analysing petsc log information and standard execution time for Stokes like equations

Matthew Knepley knepley at gmail.com
Wed Oct 29 11:35:14 CDT 2014


On Wed, Oct 29, 2014 at 11:18 AM, Bishesh Khanal <bisheshkh at gmail.com>
wrote:

> Dear all,
> The computer cluster I'm using to execute my Petsc based code (Stokes-like
> equation solver) is providing me results with a big variation on the
> execution times for almost the same problems.
> When I look at the -log_summary, and the time taken and Mflop/s for Petsc
> routines it looks like the issue is with cluster.
> I'd like to confirm this before looking into cluster related issues.
> I've provided below (after the questions) the relevant -log_summary
> outputs for two problems (also a file attached if the email does not show
> up the outputs nicely).
>
> The tow problems P1 and P2 solves Stokes-like equation of the same size
> and using same combination of KSP and PC using 64 processors (8 nodes with
> 8 proc/node).
> P1 solves the equation 6 times while P2 only once.
> The operator is slightly different due to slightly different boundary but
> since the no. of iterations from ksp_monitor were almost the same, I guess
> this is not an issue.
>
> Now the problem is that P2 case execution was much slower than P1. In
> other experiments too, the execution is quite fast sometimes but slow most
> of the other times.
>
> I can see in the log_summary that different Petsc routines are running
> much slower for P2 and has smaller Mflop/s rate.
>
> My two questions:
> 1. Do these outputs confirm that the issue is with the cluster and not
> with my code ? If yes, what kinds of things I should focus/learn while
> submitting jobs to the cluster ? Any pointer would be helpful.
>

There are a lot of variables controlling performance on a cluster. You will
have to narrow this down
little by little. However, there are enormous differences below in the
simplest operations, e.g. VecScale,
but roughly the same number of flops were computed. Thus, something is
going very wrong. My first
two guesses, and things you should eliminate

  1) Someone else was running at the same time

  2) The mapping from processes to nodes was very different

You can usually control 2 with your submission system or mpiexec call.

  Thanks,

     Matt


> 2. If I set the equations for the constant viscosity and as:
> div(grad(u)) - grad(p) = f
> div(u) + kp = g
> with k=1 in some regions and 0 in most of the other regions; with f and g
> are functions spatially varying.
> and solve the system with 64 to 128 processors using ksp and pc as:
> -pc_fieldsplit_type schur -pc_fieldsplit_schur_precondition self
> -pc_fieldsplit_dm_splits 0 -pc_fieldsplit_0_fields 0,1,2
> -pc_fieldsplit_1_fields 3 -fieldsplit_0_pc_type hypre
>
> What order of execution time for solving this system should I target to be
> reasonable with around, say 128 processors ?
>
> -log_summary output for P1
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> /epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
> arch-linux2-cxx-opt named nef017 with 64 processors, by bkhanal Tue Oct 28
> 05:52:24 2014
> Using Petsc Release Version 3.4.3, Oct, 15, 2013
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           4.221e+04      1.00780   4.201e+04
> Objects:              9.980e+02      1.00000   9.980e+02
> Flops:                2.159e+11      1.08499   2.106e+11  1.348e+13
> Flops/sec:            5.154e+06      1.08499   5.013e+06  3.208e+08
> MPI Messages:         1.316e+05      3.69736   7.413e+04  4.744e+06
> MPI Message Lengths:  1.986e+09      2.61387   1.581e+04  7.502e+10
> MPI Reductions:       8.128e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 4.2010e+04 100.0%  1.3477e+13 100.0%  4.744e+06
> 100.0%  1.581e+04      100.0%  8.127e+03 100.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecView                6 1.0 3.8704e+01424.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             3228 1.0 1.0614e+01 1.7 5.93e+09 1.1 0.0e+00 0.0e+00
> 3.2e+03  0  3  0  0 40   0  3  0  0 40 35184
> VecNorm             4383 1.0 4.0579e+01 9.8 2.73e+09 1.1 0.0e+00 0.0e+00
> 4.4e+03  0  1  0  0 54   0  1  0  0 54  4239
> VecScale            4680 1.0 1.2393e+00 1.3 1.39e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0 70518
> VecCopy             1494 1.0 1.1592e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              5477 1.0 9.7614e+02288.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY             1113 1.0 7.9877e-01 2.3 6.17e+08 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 48645
> VecAYPX              375 1.0 7.9671e-02 1.4 4.99e+07 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 39491
> VecMAXPY            4317 1.0 7.3185e+00 2.1 8.89e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  4  0  0  0   0  4  0  0  0 76569
> VecScatterBegin     5133 1.0 6.7937e+00 2.4 0.00e+00 0.0 4.7e+06 1.5e+04
> 1.2e+01  0  0100 96  0   0  0100 96  0     0
> VecScatterEnd       5121 1.0 2.5840e+02113.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize        3984 1.0 7.2473e+00 1.6 3.92e+09 1.1 0.0e+00 0.0e+00
> 4.0e+03  0  2  0  0 49   0  2  0  0 49 34118
> MatMult              910 1.0 7.5977e+03 1.0 2.11e+11 1.1 4.7e+06 1.5e+04
> 6.2e+03 18 98 99 92 76  18 98 99 92 76  1736
> MatMultAdd           702 1.0 5.5198e+01 6.3 4.98e+09 1.1 6.6e+05 6.2e+03
> 0.0e+00  0  2 14  5  0   0  2 14  5  0  5624
> MatConvert             6 1.0 3.3739e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatAssemblyBegin      55 1.0 5.7320e+0319.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.2e+01 11  0  0  0  1  11  0  0  0  1     0
> MatAssemblyEnd        55 1.0 1.6179e+00 1.3 0.00e+00 0.0 9.4e+03 3.6e+03
> 4.0e+01  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ           12 1.0 1.3590e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries        20 1.0 6.9117e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView               18 1.0 4.7298e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.6e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog      3228 1.0 1.4727e+01 1.7 1.19e+10 1.1 0.0e+00 0.0e+00
> 3.2e+03  0  6  0  0 40   0  6  0  0 40 50718
> KSPSetUp              18 1.0 5.4130e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               6 1.0 3.2363e+04 1.0 2.15e+11 1.1 4.7e+06 1.5e+04
> 7.7e+03 77100 99 92 95  77100 99 92 95   415
> PCSetUp               18 1.0 2.5724e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04
> 2.0e+02 61  0  0  0  2  61  0  0  0  2     0
> PCApply               18 1.0 3.2248e+04 1.0 2.12e+11 1.1 4.7e+06 1.5e+04
> 7.6e+03 77 98 99 91 93  77 98 99 91 93   411
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Vector   859            849   1446200816     0
>       Vector Scatter    26             16        16352     0
>               Matrix    20             20   1832984916     0
>     Distributed Mesh     3              3      5457592     0
>      Bipartite Graph     6              6         4848     0
>            Index Set    61             61      1882648     0
>    IS L to G Mapping     5              5      3961468     0
>        Krylov Solver     5              5        57888     0
>      DMKSP interface     1              1          656     0
>       Preconditioner     5              5         4440     0
>               Viewer     7              6         4272     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 0
> Average time for MPI_Barrier(): 1.27792e-05
> Average time for zero size MPI_Send(): 5.20423e-06
> #PETSc Option Table entries:
>
>
>
>
> -log_summary output for P2:
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> /epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
> arch-linux2-cxx-opt named nef001 with 64 processors, by bkhanal Wed Oct 29
> 14:24:36 2014
> Using Petsc Release Version 3.4.3, Oct, 15, 2013
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           1.958e+04      1.00194   1.955e+04
> Objects:              3.190e+02      1.00000   3.190e+02
> Flops:                3.638e+10      1.08499   3.548e+10  2.271e+12
> Flops/sec:            1.861e+06      1.08676   1.815e+06  1.161e+08
> MPI Messages:         2.253e+04      3.68455   1.270e+04  8.131e+05
> MPI Message Lengths:  3.403e+08      2.51345   1.616e+04  1.314e+10
> MPI Reductions:       1.544e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.9554e+04 100.0%  2.2709e+12 100.0%  8.131e+05
> 100.0%  1.616e+04      100.0%  1.543e+03  99.9%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)
> Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecView                1 1.0 4.4869e+02189.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecMDot              544 1.0 1.8271e+01 2.2 1.00e+09 1.1 0.0e+00 0.0e+00
> 5.4e+02  0  3  0  0 35   0  3  0  0 35  3456
> VecNorm              738 1.0 2.0433e+0218.1 4.60e+08 1.1 0.0e+00 0.0e+00
> 7.4e+02  1  1  0  0 48   1  1  0  0 48   142
> VecScale             788 1.0 4.1195e+00 9.0 2.34e+08 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  3573
> VecCopy              251 1.0 7.6140e+0046.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet               926 1.0 3.9087e+0141.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              187 1.0 6.0848e+0032.3 1.04e+08 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1073
> VecAYPX               63 1.0 4.6702e-0116.2 8.38e+06 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1131
> VecMAXPY             727 1.0 1.0997e+01 4.9 1.50e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  4  0  0  0   0  4  0  0  0  8610
> VecScatterBegin      864 1.0 2.0978e+0234.1 0.00e+00 0.0 8.0e+05 1.5e+04
> 2.0e+00  0  0 98 92  0   0  0 98 92  0     0
> VecScatterEnd        862 1.0 5.4781e+02114.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecNormalize         671 1.0 1.6922e+01 2.2 6.61e+08 1.1 0.0e+00 0.0e+00
> 6.7e+02  0  2  0  0 43   0  2  0  0 43  2461
> MatMult              152 1.0 6.3271e+03 1.0 3.56e+10 1.1 7.9e+05 1.5e+04
> 1.0e+03 32 98 98 89 68  32 98 98 89 68   351
> MatMultAdd           118 1.0 4.5234e+02183.7 8.36e+08 1.1 1.1e+05 6.2e+03
> 0.0e+00  1  2 14  5  0   1  2 14  5  0   115
> MatConvert             1 1.0 3.6065e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> MatAssemblyBegin      10 1.0 1.0849e+03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.2e+01  5  0  0  0  1   5  0  0  0  1     0
> MatAssemblyEnd        10 1.0 1.3957e+01 1.1 0.00e+00 0.0 9.4e+03 3.6e+03
> 4.0e+01  0  0  1  0  3   0  0  1  0  3     0
> MatGetRowIJ            2 1.0 2.2221e-03582.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView                3 1.0 3.7378e-01 9.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog       544 1.0 2.0370e+01 1.7 2.00e+09 1.1 0.0e+00 0.0e+00
> 5.4e+02  0  6  0  0 35   0  6  0  0 35  6200
> KSPSetUp               3 1.0 4.2598e+01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+01  0  0  0  0  1   0  0  0  0  1     0
> KSPSolve               1 1.0 1.5113e+04 1.0 3.63e+10 1.1 7.9e+05 1.5e+04
> 1.3e+03 77100 98 89 84  77100 98 89 84   150
> PCSetUp                3 1.0 1.1794e+04 1.0 0.00e+00 0.0 7.7e+03 3.3e+04
> 1.3e+02 60  0  1  2  8  60  0  1  2  8     0
> PCApply                3 1.0 1.4940e+04 1.0 3.58e+10 1.1 7.9e+05 1.5e+04
> 1.3e+03 76 98 97 88 83  76 98 97 88 83   149
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Vector   215            215    743212688     0
>       Vector Scatter    16             16        16352     0
>               Matrix    20             20   1832984916     0
>     Distributed Mesh     3              3      5457592     0
>      Bipartite Graph     6              6         4848     0
>            Index Set    41             41      1867368     0
>    IS L to G Mapping     5              5      3961468     0
>        Krylov Solver     5              5        57888     0
>      DMKSP interface     1              1          656     0
>       Preconditioner     5              5         4440     0
>               Viewer     2              1          712     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 0.000274992
> Average time for zero size MPI_Send(): 1.67042e-05
> #PETSc Option Table entries:
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141029/265f46e8/attachment-0001.html>


More information about the petsc-users mailing list