[petsc-users] large PetscCommDuplicate overhead
Matthew Overholt
overholt at capesim.com
Wed Oct 12 10:48:18 CDT 2016
Jed,
I realize that the PetscCommDuplicate (PCD) overhead I am seeing must be
only indirectly related to the problem size, etc., and I wouldn't be
surprised if it was an artifact of some sort related to my specific
algorithm. So you may not want to pursue this much further. However, I did
make three runs using the same Edison environment and code but different
input geometry files. Earlier I found a strong dependence on the number of
processes, so for this test I ran all of the tests on 1 node with 8
processes (N=1, n=8). What I found was that the amount of PCD overhead was
geometry dependent, not size dependent. A moderately-sized simple geometry
(with relatively few ghosted vertices at the simple-planar interfaces) had
no PCD overhead, whereas both small and large complex geometries (with
relatively more ghosted vertices at the more-complex interfaces) had 5 - 6%
PCD overhead. The log files follow.
Thanks,
Matt Overholt
////////////////////////////////////////////////////////////////////////////
/////
// pllSs20 : Simple Pencil Shape with Small Planar Interfaces
// No PetscCommDuplicate() overhead in CrayPat sampling.
////////////////////////////////////////////////////////////////////////////
/////
...713097 vertices...646400 elements (moderate size)
Total run time was 47.853 seconds.
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
Using Petsc Release Version 3.7.2, Jun, 05, 2016
Max Max/Min Avg Total
Time (sec): 4.822e+01 1.01022 4.780e+01
Objects: 1.160e+02 1.00000 1.160e+02
Flops: 1.425e+06 1.00000 1.425e+06 1.140e+07
Flops/sec: 2.986e+04 1.01022 2.982e+04 2.385e+05
MPI Messages: 1.140e+02 1.46154 9.500e+01 7.600e+02
MPI Message Lengths: 7.202e+07 3.55575 2.888e+05 2.195e+08
MPI Reductions: 2.600e+02 1.00000
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 4.7802e+01 100.0% 1.1402e+07 100.0% 7.600e+02 100.0%
2.888e+05 100.0% 2.590e+02 99.6%
----------------------------------------------------------------------------
--------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
----------------------------------------------------------------------------
--------------------------------------------
--- Event Stage 0: Main Stage
VecMax 8 1.0 3.8538e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00
8.0e+00 0 0 0 0 3 0 0 0 0 3 0
VecNorm 8 1.0 1.8222e-03 1.0 7.13e+05 1.0 0.0e+00 0.0e+00
8.0e+00 0 50 0 0 3 0 50 0 0 3 3129
VecCopy 8 1.0 5.5838e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 31 1.0 9.2564e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 8 1.0 1.0111e-03 1.2 7.13e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 50 0 0 0 0 50 0 0 0 5638
VecAssemblyBegin 26 1.0 4.5311e-01 1.9 0.00e+00 0.0 2.1e+02 5.7e+05
7.8e+01 1 0 28 55 30 1 0 28 55 30 0
VecAssemblyEnd 26 1.0 5.2852e-0211.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 39 1.0 4.5546e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
3.9e+01 0 0 0 29 15 0 0 0 29 15 0
VecScatterEnd 19 1.0 2.4319e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatSolve 8 1.0 1.3143e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
3.2e+01 3 0 44 31 12 3 0 44 31 12 0
MatLUFactorSym 1 1.0 1.1012e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 23 0 0 0 2 23 0 0 0 2 0
MatLUFactorNum 8 1.0 3.1378e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 66 0 0 0 0 66 0 0 0 0 0
MatAssemblyBegin 8 1.0 1.5738e-03 2.0 0.00e+00 0.0 1.7e+02 6.8e+04
1.6e+01 0 0 22 5 6 0 0 22 5 6 0
MatAssemblyEnd 8 1.0 3.4761e-02 1.0 0.00e+00 0.0 2.8e+01 8.8e+02
2.3e+01 0 0 4 0 9 0 0 4 0 9 0
MatGetRowIJ 1 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 3.5882e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 8 1.0 3.6283e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 8 1.0 4.3712e+01 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
4.3e+01 91 0 44 31 17 91 0 44 31 17 0
PCSetUp 8 1.0 4.2397e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+01 89 0 0 0 4 89 0 0 0 4 0
PCApply 8 1.0 1.3144e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
3.2e+01 3 0 44 31 12 3 0 44 31 12 0
----------------------------------------------------------------------------
--------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 46 43 49344760 0.
Vector Scatter 23 22 18176 0.
Index Set 37 37 3279764 0.
IS L to G Mapping 1 0 0 0.
Matrix 6 6 43498680 0.
Krylov Solver 1 1 1160 0.
Preconditioner 1 1 992 0.
Viewer 1 0 0 0.
============================================================================
============================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.81198e-06
Average time for zero size MPI_Send(): 3.75509e-06
////////////////////////////////////////////////////////////////////////////
/////
// HBT-HP5 : Small Complex Shape with Complex Interfaces
// 5.3% PetscCommDuplicate() overhead in CrayPat sampling.
////////////////////////////////////////////////////////////////////////////
/////
...50564 vertices...45420 elements (small size)
Total run time was 4.863 seconds.
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
Using Petsc Release Version 3.7.2, Jun, 05, 2016
Max Max/Min Avg Total
Time (sec): 4.930e+00 1.00841 4.894e+00
Objects: 1.080e+02 1.00000 1.080e+02
Flops: 7.539e+04 1.06240 7.267e+04 5.813e+05
Flops/sec: 1.542e+04 1.06244 1.485e+04 1.188e+05
MPI Messages: 2.070e+02 2.43529 1.628e+02 1.302e+03
MPI Message Lengths: 7.824e+06 2.40897 2.965e+04 3.861e+07
MPI Reductions: 2.320e+02 1.00000
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 4.8945e+00 100.0% 5.8133e+05 100.0% 1.302e+03 100.0%
2.965e+04 100.0% 2.310e+02 99.6%
----------------------------------------------------------------------------
--------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
----------------------------------------------------------------------------
--------------------------------------------
--- Event Stage 0: Main Stage
VecMax 6 1.0 1.1725e-0321.6 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 3 0 0 0 0 3 0
VecNorm 6 1.0 2.2173e-02 1.0 3.77e+04 1.1 0.0e+00 0.0e+00
6.0e+00 0 50 0 0 3 0 50 0 0 3 13
VecCopy 6 1.0 3.1948e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 27 1.0 4.0102e-04 4.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6 1.0 6.3896e-05 1.2 3.77e+04 1.1 0.0e+00 0.0e+00
0.0e+00 0 50 0 0 0 0 50 0 0 0 4549
VecAssemblyBegin 24 1.0 5.7409e-02 1.7 0.00e+00 0.0 3.1e+02 3.7e+04
7.2e+01 1 0 24 30 31 1 0 24 30 31 0
VecAssemblyEnd 24 1.0 4.1070e-0322.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 33 1.0 6.2511e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
3.3e+01 0 0 0 13 14 0 0 0 13 14 0
VecScatterEnd 15 1.0 2.7657e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatSolve 6 1.0 1.3485e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
2.4e+01 3 0 45 9 10 3 0 45 9 10 0
MatLUFactorSym 1 1.0 6.4367e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 13 0 0 0 2 13 0 0 0 2 0
MatLUFactorNum 6 1.0 3.5177e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 72 0 0 0 0 72 0 0 0 0 0
MatAssemblyBegin 6 1.0 3.6111e-03 2.0 0.00e+00 0.0 3.1e+02 6.8e+04
1.2e+01 0 0 24 54 5 0 0 24 54 5 0
MatAssemblyEnd 6 1.0 2.2925e-02 1.1 0.00e+00 0.0 6.8e+01 8.8e+02
1.9e+01 0 0 5 0 8 0 0 5 0 8 0
MatGetRowIJ 1 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.3869e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 6 1.0 1.6685e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 6 1.0 4.3020e+00 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
3.5e+01 88 0 45 9 15 88 0 45 9 15 0
PCSetUp 6 1.0 4.1670e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+01 85 0 0 0 5 85 0 0 0 5 0
PCApply 6 1.0 1.3486e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
2.4e+01 3 0 45 9 10 3 0 45 9 10 0
----------------------------------------------------------------------------
--------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 44 41 3635912 0.
Vector Scatter 21 20 90672 0.
Index Set 33 33 286872 0.
IS L to G Mapping 1 0 0 0.
Matrix 6 6 2988180 0.
Krylov Solver 1 1 1160 0.
Preconditioner 1 1 992 0.
Viewer 1 0 0 0.
============================================================================
============================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 3.12924e-06
////////////////////////////////////////////////////////////////////////////
/////
// GaNSi13 : Large Complex Shape with Complex Interfaces
// 6.4% PetscCommDuplicate() overhead in CrayPat sampling.
////////////////////////////////////////////////////////////////////////////
/////
...1642311 vertices...1497368 elements (large size)
Total run time was 260.958 seconds.
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
Using Petsc Release Version 3.7.2, Jun, 05, 2016
Max Max/Min Avg Total
Time (sec): 2.619e+02 1.00319 2.611e+02
Objects: 1.040e+02 1.00000 1.040e+02
Flops: 2.050e+06 1.07341 1.969e+06 1.575e+07
Flops/sec: 7.853e+03 1.07354 7.541e+03 6.032e+04
MPI Messages: 1.835e+02 1.47390 1.448e+02 1.158e+03
MPI Message Lengths: 1.761e+08 3.47614 4.801e+05 5.560e+08
MPI Reductions: 2.180e+02 1.00000
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 2.6114e+02 100.0% 1.5754e+07 100.0% 1.158e+03 100.0%
4.801e+05 100.0% 2.170e+02 99.5%
----------------------------------------------------------------------------
--------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
----------------------------------------------------------------------------
--------------------------------------------
--- Event Stage 0: Main Stage
VecMax 5 1.0 5.5013e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 0 0 0 0 2 0 0 0 0 2 0
VecNorm 5 1.0 1.8921e-03 1.0 1.02e+06 1.1 0.0e+00 0.0e+00
5.0e+00 0 50 0 0 2 0 50 0 0 2 4163
VecCopy 5 1.0 9.0218e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 25 1.0 2.4175e-0210.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 5 1.0 1.2169e-03 1.1 1.02e+06 1.1 0.0e+00 0.0e+00
0.0e+00 0 50 0 0 0 0 50 0 0 0 6473
VecAssemblyBegin 23 1.0 2.6960e+00 1.6 0.00e+00 0.0 3.2e+02 9.2e+05
6.9e+01 1 0 28 54 32 1 0 28 54 32 0
VecAssemblyEnd 23 1.0 1.2512e-0111.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 30 1.0 1.3994e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+01 0 0 0 19 14 0 0 0 19 14 0
VecScatterEnd 13 1.0 4.6802e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatSolve 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
2.0e+01 1 0 32 15 9 1 0 32 15 9 0
MatLUFactorSym 1 1.0 2.8861e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 11 0 0 0 2 11 0 0 0 2 0
MatLUFactorNum 5 1.0 1.9893e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 76 0 0 0 0 76 0 0 0 0 0
MatAssemblyBegin 5 1.0 1.6689e-02 2.9 0.00e+00 0.0 3.3e+02 3.7e+05
1.0e+01 0 0 28 22 5 0 0 28 22 5 0
MatAssemblyEnd 5 1.0 5.6672e+00 1.0 0.00e+00 0.0 9.2e+01 4.5e+03
1.7e+01 2 0 8 0 8 2 0 8 0 8 0
MatGetRowIJ 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 7.0381e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 5 1.0 7.0859e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 5 1.0 2.3079e+02 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
3.1e+01 88 0 32 15 14 88 0 32 15 14 0
PCSetUp 5 1.0 2.2781e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+01 87 0 0 0 5 87 0 0 0 5 0
PCApply 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
2.0e+01 1 0 32 15 9 1 0 32 15 9 0
----------------------------------------------------------------------------
--------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 40 114480760 0.
Vector Scatter 20 19 291760 0.
Index Set 31 31 5440808 0.
IS L to G Mapping 1 0 0 0.
Matrix 6 6 99689528 0.
Krylov Solver 1 1 1160 0.
Preconditioner 1 1 992 0.
Viewer 1 0 0 0.
============================================================================
============================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 1.81198e-06
Average time for zero size MPI_Send(): 3.75509e-06
-----Original Message-----
From: Jed Brown [mailto:jed at jedbrown.org]
Sent: Tuesday, October 11, 2016 5:19 PM
To: overholt at capesim.com; 'Barry Smith'
Cc: 'PETSc'
Subject: Re: [petsc-users] large PetscCommDuplicate overhead
Matthew Overholt <overholt at capesim.com> writes:
> Barry,
>
> Subsequent tests with the same code and a problem (input) having a
> much smaller vertex (equation) count (i.e. a much smaller matrix to
> invert for the solution) have NOT had PetscCommDuplicate() account for
> any significant time, so I'm not surprised that your test didn't find any
problem.
Can you re-run the large and small configurations with the same
code/environment and resend those logs? PetscCommDuplicate has nothing to
do with the problem size, so any difference in cost must be indirect, though
attribute access should be simple and independent.
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the petsc-users
mailing list