[petsc-users] large PetscCommDuplicate overhead

Matthew Overholt overholt at capesim.com
Wed Oct 12 10:48:18 CDT 2016


Jed,

I realize that the PetscCommDuplicate (PCD) overhead I am seeing must be
only indirectly related to the problem size, etc., and I wouldn't be
surprised if it was an artifact of some sort related to my specific
algorithm.  So you may not want to pursue this much further.  However, I did
make three runs using the same Edison environment and code but different
input geometry files.  Earlier I found a strong dependence on the number of
processes, so for this test I ran all of the tests on 1 node with 8
processes (N=1, n=8).  What I found was that the amount of PCD overhead was
geometry dependent, not size dependent.  A moderately-sized simple geometry
(with relatively few ghosted vertices at the simple-planar interfaces) had
no PCD overhead, whereas both small and large complex geometries (with
relatively more ghosted vertices at the more-complex interfaces) had 5 - 6%
PCD overhead.  The log files follow.

Thanks,
Matt Overholt

////////////////////////////////////////////////////////////////////////////
/////
// pllSs20 : Simple Pencil Shape with Small Planar Interfaces
//           No PetscCommDuplicate() overhead in CrayPat sampling.
////////////////////////////////////////////////////////////////////////////
/////
...713097 vertices...646400 elements (moderate size)

Total run time was 47.853 seconds.

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

Using Petsc Release Version 3.7.2, Jun, 05, 2016 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.822e+01      1.01022   4.780e+01
Objects:              1.160e+02      1.00000   1.160e+02
Flops:                1.425e+06      1.00000   1.425e+06  1.140e+07
Flops/sec:            2.986e+04      1.01022   2.982e+04  2.385e+05
MPI Messages:         1.140e+02      1.46154   9.500e+01  7.600e+02
MPI Message Lengths:  7.202e+07      3.55575   2.888e+05  2.195e+08
MPI Reductions:       2.600e+02      1.00000

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total
Avg         %Total   counts   %Total 
 0:      Main Stage: 4.7802e+01 100.0%  1.1402e+07 100.0%  7.600e+02 100.0%
2.888e+05      100.0%  2.590e+02  99.6% 

----------------------------------------------------------------------------
--------------------------------------------
Event                Count      Time (sec)     Flops
--- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
----------------------------------------------------------------------------
--------------------------------------------

--- Event Stage 0: Main Stage

VecMax                 8 1.0 3.8538e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00
8.0e+00  0  0  0  0  3   0  0  0  0  3     0
VecNorm                8 1.0 1.8222e-03 1.0 7.13e+05 1.0 0.0e+00 0.0e+00
8.0e+00  0 50  0  0  3   0 50  0  0  3  3129
VecCopy                8 1.0 5.5838e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                31 1.0 9.2564e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX                8 1.0 1.0111e-03 1.2 7.13e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0 50  0  0  0   0 50  0  0  0  5638
VecAssemblyBegin      26 1.0 4.5311e-01 1.9 0.00e+00 0.0 2.1e+02 5.7e+05
7.8e+01  1  0 28 55 30   1  0 28 55 30     0
VecAssemblyEnd        26 1.0 5.2852e-0211.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       39 1.0 4.5546e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
3.9e+01  0  0  0 29 15   0  0  0 29 15     0
VecScatterEnd         19 1.0 2.4319e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatSolve               8 1.0 1.3143e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
3.2e+01  3  0 44 31 12   3  0 44 31 12     0
MatLUFactorSym         1 1.0 1.1012e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 23  0  0  0  2  23  0  0  0  2     0
MatLUFactorNum         8 1.0 3.1378e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 66  0  0  0  0  66  0  0  0  0     0
MatAssemblyBegin       8 1.0 1.5738e-03 2.0 0.00e+00 0.0 1.7e+02 6.8e+04
1.6e+01  0  0 22  5  6   0  0 22  5  6     0
MatAssemblyEnd         8 1.0 3.4761e-02 1.0 0.00e+00 0.0 2.8e+01 8.8e+02
2.3e+01  0  0  4  0  9   0  0  4  0  9     0
MatGetRowIJ            1 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 3.5882e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         8 1.0 3.6283e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               8 1.0 4.3712e+01 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
4.3e+01 91  0 44 31 17  91  0 44 31 17     0
PCSetUp                8 1.0 4.2397e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+01 89  0  0  0  4  89  0  0  0  4     0
PCApply                8 1.0 1.3144e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
3.2e+01  3  0 44 31 12   3  0 44 31 12     0
----------------------------------------------------------------------------
--------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    46             43     49344760     0.
      Vector Scatter    23             22        18176     0.
           Index Set    37             37      3279764     0.
   IS L to G Mapping     1              0            0     0.
              Matrix     6              6     43498680     0.
       Krylov Solver     1              1         1160     0.
      Preconditioner     1              1          992     0.
              Viewer     1              0            0     0.
============================================================================
============================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.81198e-06
Average time for zero size MPI_Send(): 3.75509e-06

////////////////////////////////////////////////////////////////////////////
/////
// HBT-HP5 : Small Complex Shape with Complex Interfaces
//           5.3% PetscCommDuplicate() overhead in CrayPat sampling.
////////////////////////////////////////////////////////////////////////////
/////
...50564 vertices...45420 elements  (small size)

Total run time was 4.863 seconds.

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

Using Petsc Release Version 3.7.2, Jun, 05, 2016 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.930e+00      1.00841   4.894e+00
Objects:              1.080e+02      1.00000   1.080e+02
Flops:                7.539e+04      1.06240   7.267e+04  5.813e+05
Flops/sec:            1.542e+04      1.06244   1.485e+04  1.188e+05
MPI Messages:         2.070e+02      2.43529   1.628e+02  1.302e+03
MPI Message Lengths:  7.824e+06      2.40897   2.965e+04  3.861e+07
MPI Reductions:       2.320e+02      1.00000

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total
Avg         %Total   counts   %Total 
 0:      Main Stage: 4.8945e+00 100.0%  5.8133e+05 100.0%  1.302e+03 100.0%
2.965e+04      100.0%  2.310e+02  99.6% 

----------------------------------------------------------------------------
--------------------------------------------
Event                Count      Time (sec)     Flops
--- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
----------------------------------------------------------------------------
--------------------------------------------

--- Event Stage 0: Main Stage

VecMax                 6 1.0 1.1725e-0321.6 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  3   0  0  0  0  3     0
VecNorm                6 1.0 2.2173e-02 1.0 3.77e+04 1.1 0.0e+00 0.0e+00
6.0e+00  0 50  0  0  3   0 50  0  0  3    13
VecCopy                6 1.0 3.1948e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                27 1.0 4.0102e-04 4.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX                6 1.0 6.3896e-05 1.2 3.77e+04 1.1 0.0e+00 0.0e+00
0.0e+00  0 50  0  0  0   0 50  0  0  0  4549
VecAssemblyBegin      24 1.0 5.7409e-02 1.7 0.00e+00 0.0 3.1e+02 3.7e+04
7.2e+01  1  0 24 30 31   1  0 24 30 31     0
VecAssemblyEnd        24 1.0 4.1070e-0322.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       33 1.0 6.2511e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
3.3e+01  0  0  0 13 14   0  0  0 13 14     0
VecScatterEnd         15 1.0 2.7657e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatSolve               6 1.0 1.3485e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
2.4e+01  3  0 45  9 10   3  0 45  9 10     0
MatLUFactorSym         1 1.0 6.4367e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 13  0  0  0  2  13  0  0  0  2     0
MatLUFactorNum         6 1.0 3.5177e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 72  0  0  0  0  72  0  0  0  0     0
MatAssemblyBegin       6 1.0 3.6111e-03 2.0 0.00e+00 0.0 3.1e+02 6.8e+04
1.2e+01  0  0 24 54  5   0  0 24 54  5     0
MatAssemblyEnd         6 1.0 2.2925e-02 1.1 0.00e+00 0.0 6.8e+01 8.8e+02
1.9e+01  0  0  5  0  8   0  0  5  0  8     0
MatGetRowIJ            1 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.3869e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         6 1.0 1.6685e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               6 1.0 4.3020e+00 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
3.5e+01 88  0 45  9 15  88  0 45  9 15     0
PCSetUp                6 1.0 4.1670e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+01 85  0  0  0  5  85  0  0  0  5     0
PCApply                6 1.0 1.3486e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
2.4e+01  3  0 45  9 10   3  0 45  9 10     0
----------------------------------------------------------------------------
--------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    44             41      3635912     0.
      Vector Scatter    21             20        90672     0.
           Index Set    33             33       286872     0.
   IS L to G Mapping     1              0            0     0.
              Matrix     6              6      2988180     0.
       Krylov Solver     1              1         1160     0.
      Preconditioner     1              1          992     0.
              Viewer     1              0            0     0.
============================================================================
============================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 3.12924e-06

////////////////////////////////////////////////////////////////////////////
/////
// GaNSi13 : Large Complex Shape with Complex Interfaces
//           6.4% PetscCommDuplicate() overhead in CrayPat sampling.
////////////////////////////////////////////////////////////////////////////
/////
...1642311 vertices...1497368 elements  (large size)

Total run time was 260.958 seconds.

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

Using Petsc Release Version 3.7.2, Jun, 05, 2016 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.619e+02      1.00319   2.611e+02
Objects:              1.040e+02      1.00000   1.040e+02
Flops:                2.050e+06      1.07341   1.969e+06  1.575e+07
Flops/sec:            7.853e+03      1.07354   7.541e+03  6.032e+04
MPI Messages:         1.835e+02      1.47390   1.448e+02  1.158e+03
MPI Message Lengths:  1.761e+08      3.47614   4.801e+05  5.560e+08
MPI Reductions:       2.180e+02      1.00000

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total
Avg         %Total   counts   %Total 
 0:      Main Stage: 2.6114e+02 100.0%  1.5754e+07 100.0%  1.158e+03 100.0%
4.801e+05      100.0%  2.170e+02  99.5% 

----------------------------------------------------------------------------
--------------------------------------------
Event                Count      Time (sec)     Flops
--- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
----------------------------------------------------------------------------
--------------------------------------------

--- Event Stage 0: Main Stage

VecMax                 5 1.0 5.5013e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00  0  0  0  0  2   0  0  0  0  2     0
VecNorm                5 1.0 1.8921e-03 1.0 1.02e+06 1.1 0.0e+00 0.0e+00
5.0e+00  0 50  0  0  2   0 50  0  0  2  4163
VecCopy                5 1.0 9.0218e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                25 1.0 2.4175e-0210.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX                5 1.0 1.2169e-03 1.1 1.02e+06 1.1 0.0e+00 0.0e+00
0.0e+00  0 50  0  0  0   0 50  0  0  0  6473
VecAssemblyBegin      23 1.0 2.6960e+00 1.6 0.00e+00 0.0 3.2e+02 9.2e+05
6.9e+01  1  0 28 54 32   1  0 28 54 32     0
VecAssemblyEnd        23 1.0 1.2512e-0111.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       30 1.0 1.3994e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+01  0  0  0 19 14   0  0  0 19 14     0
VecScatterEnd         13 1.0 4.6802e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatSolve               5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
2.0e+01  1  0 32 15  9   1  0 32 15  9     0
MatLUFactorSym         1 1.0 2.8861e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 11  0  0  0  2  11  0  0  0  2     0
MatLUFactorNum         5 1.0 1.9893e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 76  0  0  0  0  76  0  0  0  0     0
MatAssemblyBegin       5 1.0 1.6689e-02 2.9 0.00e+00 0.0 3.3e+02 3.7e+05
1.0e+01  0  0 28 22  5   0  0 28 22  5     0
MatAssemblyEnd         5 1.0 5.6672e+00 1.0 0.00e+00 0.0 9.2e+01 4.5e+03
1.7e+01  2  0  8  0  8   2  0  8  0  8     0
MatGetRowIJ            1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 7.0381e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         5 1.0 7.0859e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               5 1.0 2.3079e+02 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
3.1e+01 88  0 32 15 14  88  0 32 15 14     0
PCSetUp                5 1.0 2.2781e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+01 87  0  0  0  5  87  0  0  0  5     0
PCApply                5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
2.0e+01  1  0 32 15  9   1  0 32 15  9     0
----------------------------------------------------------------------------
--------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             40    114480760     0.
      Vector Scatter    20             19       291760     0.
           Index Set    31             31      5440808     0.
   IS L to G Mapping     1              0            0     0.
              Matrix     6              6     99689528     0.
       Krylov Solver     1              1         1160     0.
      Preconditioner     1              1          992     0.
              Viewer     1              0            0     0.
============================================================================
============================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 1.81198e-06
Average time for zero size MPI_Send(): 3.75509e-06

-----Original Message-----
From: Jed Brown [mailto:jed at jedbrown.org] 
Sent: Tuesday, October 11, 2016 5:19 PM
To: overholt at capesim.com; 'Barry Smith'
Cc: 'PETSc'
Subject: Re: [petsc-users] large PetscCommDuplicate overhead

Matthew Overholt <overholt at capesim.com> writes:

> Barry,
>
> Subsequent tests with the same code and a problem (input) having a 
> much smaller vertex (equation) count (i.e. a much smaller matrix to 
> invert for the solution) have NOT had PetscCommDuplicate() account for 
> any significant time, so I'm not surprised that your test didn't find any
problem.

Can you re-run the large and small configurations with the same
code/environment and resend those logs?  PetscCommDuplicate has nothing to
do with the problem size, so any difference in cost must be indirect, though
attribute access should be simple and independent.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the petsc-users mailing list