[petsc-users] large PetscCommDuplicate overhead

Barry Smith bsmith at mcs.anl.gov
Thu Oct 13 22:45:56 CDT 2016


  Mathew,

    Thanks for the additional information. This is all very weird since the same number of calls  made to PetscCommDuplicate() are the same regardless of geometry and the time of the call shouldn't depend on the geometry.

    Would you be able to do another set of tests where you track the time in MPI_Get_attr() and MPI_Barrier() instead of PetscCommDuplicate()? It could be Cray did something "funny" in their implementation of PETSc.

   You could also try using the module petsc/3.7.3 instead of the cray-petsc module

   Thanks

  Barry




> On Oct 12, 2016, at 10:48 AM, Matthew Overholt <overholt at capesim.com> wrote:
> 
> Jed,
> 
> I realize that the PetscCommDuplicate (PCD) overhead I am seeing must be
> only indirectly related to the problem size, etc., and I wouldn't be
> surprised if it was an artifact of some sort related to my specific
> algorithm.  So you may not want to pursue this much further.  However, I did
> make three runs using the same Edison environment and code but different
> input geometry files.  Earlier I found a strong dependence on the number of
> processes, so for this test I ran all of the tests on 1 node with 8
> processes (N=1, n=8).  What I found was that the amount of PCD overhead was
> geometry dependent, not size dependent.  A moderately-sized simple geometry
> (with relatively few ghosted vertices at the simple-planar interfaces) had
> no PCD overhead, whereas both small and large complex geometries (with
> relatively more ghosted vertices at the more-complex interfaces) had 5 - 6%
> PCD overhead.  The log files follow.
> 
> Thanks,
> Matt Overholt
> 
> ////////////////////////////////////////////////////////////////////////////
> /////
> // pllSs20 : Simple Pencil Shape with Small Planar Interfaces
> //           No PetscCommDuplicate() overhead in CrayPat sampling.
> ////////////////////////////////////////////////////////////////////////////
> /////
> ...713097 vertices...646400 elements (moderate size)
> 
> Total run time was 47.853 seconds.
> 
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
> 
> Using Petsc Release Version 3.7.2, Jun, 05, 2016 
> 
>                         Max       Max/Min        Avg      Total 
> Time (sec):           4.822e+01      1.01022   4.780e+01
> Objects:              1.160e+02      1.00000   1.160e+02
> Flops:                1.425e+06      1.00000   1.425e+06  1.140e+07
> Flops/sec:            2.986e+04      1.01022   2.982e+04  2.385e+05
> MPI Messages:         1.140e+02      1.46154   9.500e+01  7.600e+02
> MPI Message Lengths:  7.202e+07      3.55575   2.888e+05  2.195e+08
> MPI Reductions:       2.600e+02      1.00000
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
> -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts   %Total
> Avg         %Total   counts   %Total 
> 0:      Main Stage: 4.7802e+01 100.0%  1.1402e+07 100.0%  7.600e+02 100.0%
> 2.888e+05      100.0%  2.590e+02  99.6% 
> 
> ----------------------------------------------------------------------------
> --------------------------------------------
> Event                Count      Time (sec)     Flops
> --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> VecMax                 8 1.0 3.8538e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 8.0e+00  0  0  0  0  3   0  0  0  0  3     0
> VecNorm                8 1.0 1.8222e-03 1.0 7.13e+05 1.0 0.0e+00 0.0e+00
> 8.0e+00  0 50  0  0  3   0 50  0  0  3  3129
> VecCopy                8 1.0 5.5838e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                31 1.0 9.2564e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAYPX                8 1.0 1.0111e-03 1.2 7.13e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 50  0  0  0   0 50  0  0  0  5638
> VecAssemblyBegin      26 1.0 4.5311e-01 1.9 0.00e+00 0.0 2.1e+02 5.7e+05
> 7.8e+01  1  0 28 55 30   1  0 28 55 30     0
> VecAssemblyEnd        26 1.0 5.2852e-0211.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin       39 1.0 4.5546e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.9e+01  0  0  0 29 15   0  0  0 29 15     0
> VecScatterEnd         19 1.0 2.4319e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatSolve               8 1.0 1.3143e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
> 3.2e+01  3  0 44 31 12   3  0 44 31 12     0
> MatLUFactorSym         1 1.0 1.1012e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 23  0  0  0  2  23  0  0  0  2     0
> MatLUFactorNum         8 1.0 3.1378e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 66  0  0  0  0  66  0  0  0  0     0
> MatAssemblyBegin       8 1.0 1.5738e-03 2.0 0.00e+00 0.0 1.7e+02 6.8e+04
> 1.6e+01  0  0 22  5  6   0  0 22  5  6     0
> MatAssemblyEnd         8 1.0 3.4761e-02 1.0 0.00e+00 0.0 2.8e+01 8.8e+02
> 2.3e+01  0  0  4  0  9   0  0  4  0  9     0
> MatGetRowIJ            1 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 3.5882e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         8 1.0 3.6283e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               8 1.0 4.3712e+01 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
> 4.3e+01 91  0 44 31 17  91  0 44 31 17     0
> PCSetUp                8 1.0 4.2397e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.1e+01 89  0  0  0  4  89  0  0  0  4     0
> PCApply                8 1.0 1.3144e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
> 3.2e+01  3  0 44 31 12   3  0 44 31 12     0
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>              Vector    46             43     49344760     0.
>      Vector Scatter    23             22        18176     0.
>           Index Set    37             37      3279764     0.
>   IS L to G Mapping     1              0            0     0.
>              Matrix     6              6     43498680     0.
>       Krylov Solver     1              1         1160     0.
>      Preconditioner     1              1          992     0.
>              Viewer     1              0            0     0.
> ============================================================================
> ============================================
> Average time to get PetscTime(): 5.00679e-07
> Average time for MPI_Barrier(): 1.81198e-06
> Average time for zero size MPI_Send(): 3.75509e-06
> 
> ////////////////////////////////////////////////////////////////////////////
> /////
> // HBT-HP5 : Small Complex Shape with Complex Interfaces
> //           5.3% PetscCommDuplicate() overhead in CrayPat sampling.
> ////////////////////////////////////////////////////////////////////////////
> /////
> ...50564 vertices...45420 elements  (small size)
> 
> Total run time was 4.863 seconds.
> 
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
> 
> Using Petsc Release Version 3.7.2, Jun, 05, 2016 
> 
>                         Max       Max/Min        Avg      Total 
> Time (sec):           4.930e+00      1.00841   4.894e+00
> Objects:              1.080e+02      1.00000   1.080e+02
> Flops:                7.539e+04      1.06240   7.267e+04  5.813e+05
> Flops/sec:            1.542e+04      1.06244   1.485e+04  1.188e+05
> MPI Messages:         2.070e+02      2.43529   1.628e+02  1.302e+03
> MPI Message Lengths:  7.824e+06      2.40897   2.965e+04  3.861e+07
> MPI Reductions:       2.320e+02      1.00000
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
> -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts   %Total
> Avg         %Total   counts   %Total 
> 0:      Main Stage: 4.8945e+00 100.0%  5.8133e+05 100.0%  1.302e+03 100.0%
> 2.965e+04      100.0%  2.310e+02  99.6% 
> 
> ----------------------------------------------------------------------------
> --------------------------------------------
> Event                Count      Time (sec)     Flops
> --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> VecMax                 6 1.0 1.1725e-0321.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  3   0  0  0  0  3     0
> VecNorm                6 1.0 2.2173e-02 1.0 3.77e+04 1.1 0.0e+00 0.0e+00
> 6.0e+00  0 50  0  0  3   0 50  0  0  3    13
> VecCopy                6 1.0 3.1948e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                27 1.0 4.0102e-04 4.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAYPX                6 1.0 6.3896e-05 1.2 3.77e+04 1.1 0.0e+00 0.0e+00
> 0.0e+00  0 50  0  0  0   0 50  0  0  0  4549
> VecAssemblyBegin      24 1.0 5.7409e-02 1.7 0.00e+00 0.0 3.1e+02 3.7e+04
> 7.2e+01  1  0 24 30 31   1  0 24 30 31     0
> VecAssemblyEnd        24 1.0 4.1070e-0322.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin       33 1.0 6.2511e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.3e+01  0  0  0 13 14   0  0  0 13 14     0
> VecScatterEnd         15 1.0 2.7657e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatSolve               6 1.0 1.3485e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
> 2.4e+01  3  0 45  9 10   3  0 45  9 10     0
> MatLUFactorSym         1 1.0 6.4367e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 13  0  0  0  2  13  0  0  0  2     0
> MatLUFactorNum         6 1.0 3.5177e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 72  0  0  0  0  72  0  0  0  0     0
> MatAssemblyBegin       6 1.0 3.6111e-03 2.0 0.00e+00 0.0 3.1e+02 6.8e+04
> 1.2e+01  0  0 24 54  5   0  0 24 54  5     0
> MatAssemblyEnd         6 1.0 2.2925e-02 1.1 0.00e+00 0.0 6.8e+01 8.8e+02
> 1.9e+01  0  0  5  0  8   0  0  5  0  8     0
> MatGetRowIJ            1 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 4.3869e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         6 1.0 1.6685e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               6 1.0 4.3020e+00 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
> 3.5e+01 88  0 45  9 15  88  0 45  9 15     0
> PCSetUp                6 1.0 4.1670e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.1e+01 85  0  0  0  5  85  0  0  0  5     0
> PCApply                6 1.0 1.3486e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
> 2.4e+01  3  0 45  9 10   3  0 45  9 10     0
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>              Vector    44             41      3635912     0.
>      Vector Scatter    21             20        90672     0.
>           Index Set    33             33       286872     0.
>   IS L to G Mapping     1              0            0     0.
>              Matrix     6              6      2988180     0.
>       Krylov Solver     1              1         1160     0.
>      Preconditioner     1              1          992     0.
>              Viewer     1              0            0     0.
> ============================================================================
> ============================================
> Average time to get PetscTime(): 5.00679e-07
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 3.12924e-06
> 
> ////////////////////////////////////////////////////////////////////////////
> /////
> // GaNSi13 : Large Complex Shape with Complex Interfaces
> //           6.4% PetscCommDuplicate() overhead in CrayPat sampling.
> ////////////////////////////////////////////////////////////////////////////
> /////
> ...1642311 vertices...1497368 elements  (large size)
> 
> Total run time was 260.958 seconds.
> 
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
> 
> Using Petsc Release Version 3.7.2, Jun, 05, 2016 
> 
>                         Max       Max/Min        Avg      Total 
> Time (sec):           2.619e+02      1.00319   2.611e+02
> Objects:              1.040e+02      1.00000   1.040e+02
> Flops:                2.050e+06      1.07341   1.969e+06  1.575e+07
> Flops/sec:            7.853e+03      1.07354   7.541e+03  6.032e+04
> MPI Messages:         1.835e+02      1.47390   1.448e+02  1.158e+03
> MPI Message Lengths:  1.761e+08      3.47614   4.801e+05  5.560e+08
> MPI Reductions:       2.180e+02      1.00000
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
> -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts   %Total
> Avg         %Total   counts   %Total 
> 0:      Main Stage: 2.6114e+02 100.0%  1.5754e+07 100.0%  1.158e+03 100.0%
> 4.801e+05      100.0%  2.170e+02  99.5% 
> 
> ----------------------------------------------------------------------------
> --------------------------------------------
> Event                Count      Time (sec)     Flops
> --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> VecMax                 5 1.0 5.5013e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00  0  0  0  0  2   0  0  0  0  2     0
> VecNorm                5 1.0 1.8921e-03 1.0 1.02e+06 1.1 0.0e+00 0.0e+00
> 5.0e+00  0 50  0  0  2   0 50  0  0  2  4163
> VecCopy                5 1.0 9.0218e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                25 1.0 2.4175e-0210.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAYPX                5 1.0 1.2169e-03 1.1 1.02e+06 1.1 0.0e+00 0.0e+00
> 0.0e+00  0 50  0  0  0   0 50  0  0  0  6473
> VecAssemblyBegin      23 1.0 2.6960e+00 1.6 0.00e+00 0.0 3.2e+02 9.2e+05
> 6.9e+01  1  0 28 54 32   1  0 28 54 32     0
> VecAssemblyEnd        23 1.0 1.2512e-0111.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin       30 1.0 1.3994e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+01  0  0  0 19 14   0  0  0 19 14     0
> VecScatterEnd         13 1.0 4.6802e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatSolve               5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
> 2.0e+01  1  0 32 15  9   1  0 32 15  9     0
> MatLUFactorSym         1 1.0 2.8861e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 11  0  0  0  2  11  0  0  0  2     0
> MatLUFactorNum         5 1.0 1.9893e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 76  0  0  0  0  76  0  0  0  0     0
> MatAssemblyBegin       5 1.0 1.6689e-02 2.9 0.00e+00 0.0 3.3e+02 3.7e+05
> 1.0e+01  0  0 28 22  5   0  0 28 22  5     0
> MatAssemblyEnd         5 1.0 5.6672e+00 1.0 0.00e+00 0.0 9.2e+01 4.5e+03
> 1.7e+01  2  0  8  0  8   2  0  8  0  8     0
> MatGetRowIJ            1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 7.0381e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         5 1.0 7.0859e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               5 1.0 2.3079e+02 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
> 3.1e+01 88  0 32 15 14  88  0 32 15 14     0
> PCSetUp                5 1.0 2.2781e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.1e+01 87  0  0  0  5  87  0  0  0  5     0
> PCApply                5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
> 2.0e+01  1  0 32 15  9   1  0 32 15  9     0
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>              Vector    43             40    114480760     0.
>      Vector Scatter    20             19       291760     0.
>           Index Set    31             31      5440808     0.
>   IS L to G Mapping     1              0            0     0.
>              Matrix     6              6     99689528     0.
>       Krylov Solver     1              1         1160     0.
>      Preconditioner     1              1          992     0.
>              Viewer     1              0            0     0.
> ============================================================================
> ============================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 1.81198e-06
> Average time for zero size MPI_Send(): 3.75509e-06
> 
> -----Original Message-----
> From: Jed Brown [mailto:jed at jedbrown.org] 
> Sent: Tuesday, October 11, 2016 5:19 PM
> To: overholt at capesim.com; 'Barry Smith'
> Cc: 'PETSc'
> Subject: Re: [petsc-users] large PetscCommDuplicate overhead
> 
> Matthew Overholt <overholt at capesim.com> writes:
> 
>> Barry,
>> 
>> Subsequent tests with the same code and a problem (input) having a 
>> much smaller vertex (equation) count (i.e. a much smaller matrix to 
>> invert for the solution) have NOT had PetscCommDuplicate() account for 
>> any significant time, so I'm not surprised that your test didn't find any
> problem.
> 
> Can you re-run the large and small configurations with the same
> code/environment and resend those logs?  PetscCommDuplicate has nothing to
> do with the problem size, so any difference in cost must be indirect, though
> attribute access should be simple and independent.
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 



More information about the petsc-users mailing list