[petsc-users] large PetscCommDuplicate overhead
Barry Smith
bsmith at mcs.anl.gov
Thu Oct 13 22:45:56 CDT 2016
Mathew,
Thanks for the additional information. This is all very weird since the same number of calls made to PetscCommDuplicate() are the same regardless of geometry and the time of the call shouldn't depend on the geometry.
Would you be able to do another set of tests where you track the time in MPI_Get_attr() and MPI_Barrier() instead of PetscCommDuplicate()? It could be Cray did something "funny" in their implementation of PETSc.
You could also try using the module petsc/3.7.3 instead of the cray-petsc module
Thanks
Barry
> On Oct 12, 2016, at 10:48 AM, Matthew Overholt <overholt at capesim.com> wrote:
>
> Jed,
>
> I realize that the PetscCommDuplicate (PCD) overhead I am seeing must be
> only indirectly related to the problem size, etc., and I wouldn't be
> surprised if it was an artifact of some sort related to my specific
> algorithm. So you may not want to pursue this much further. However, I did
> make three runs using the same Edison environment and code but different
> input geometry files. Earlier I found a strong dependence on the number of
> processes, so for this test I ran all of the tests on 1 node with 8
> processes (N=1, n=8). What I found was that the amount of PCD overhead was
> geometry dependent, not size dependent. A moderately-sized simple geometry
> (with relatively few ghosted vertices at the simple-planar interfaces) had
> no PCD overhead, whereas both small and large complex geometries (with
> relatively more ghosted vertices at the more-complex interfaces) had 5 - 6%
> PCD overhead. The log files follow.
>
> Thanks,
> Matt Overholt
>
> ////////////////////////////////////////////////////////////////////////////
> /////
> // pllSs20 : Simple Pencil Shape with Small Planar Interfaces
> // No PetscCommDuplicate() overhead in CrayPat sampling.
> ////////////////////////////////////////////////////////////////////////////
> /////
> ...713097 vertices...646400 elements (moderate size)
>
> Total run time was 47.853 seconds.
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> Using Petsc Release Version 3.7.2, Jun, 05, 2016
>
> Max Max/Min Avg Total
> Time (sec): 4.822e+01 1.01022 4.780e+01
> Objects: 1.160e+02 1.00000 1.160e+02
> Flops: 1.425e+06 1.00000 1.425e+06 1.140e+07
> Flops/sec: 2.986e+04 1.01022 2.982e+04 2.385e+05
> MPI Messages: 1.140e+02 1.46154 9.500e+01 7.600e+02
> MPI Message Lengths: 7.202e+07 3.55575 2.888e+05 2.195e+08
> MPI Reductions: 2.600e+02 1.00000
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total
> Avg %Total counts %Total
> 0: Main Stage: 4.7802e+01 100.0% 1.1402e+07 100.0% 7.600e+02 100.0%
> 2.888e+05 100.0% 2.590e+02 99.6%
>
> ----------------------------------------------------------------------------
> --------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMax 8 1.0 3.8538e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 8.0e+00 0 0 0 0 3 0 0 0 0 3 0
> VecNorm 8 1.0 1.8222e-03 1.0 7.13e+05 1.0 0.0e+00 0.0e+00
> 8.0e+00 0 50 0 0 3 0 50 0 0 3 3129
> VecCopy 8 1.0 5.5838e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 31 1.0 9.2564e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAYPX 8 1.0 1.0111e-03 1.2 7.13e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 50 0 0 0 0 50 0 0 0 5638
> VecAssemblyBegin 26 1.0 4.5311e-01 1.9 0.00e+00 0.0 2.1e+02 5.7e+05
> 7.8e+01 1 0 28 55 30 1 0 28 55 30 0
> VecAssemblyEnd 26 1.0 5.2852e-0211.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 39 1.0 4.5546e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.9e+01 0 0 0 29 15 0 0 0 29 15 0
> VecScatterEnd 19 1.0 2.4319e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatSolve 8 1.0 1.3143e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
> 3.2e+01 3 0 44 31 12 3 0 44 31 12 0
> MatLUFactorSym 1 1.0 1.1012e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 23 0 0 0 2 23 0 0 0 2 0
> MatLUFactorNum 8 1.0 3.1378e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 66 0 0 0 0 66 0 0 0 0 0
> MatAssemblyBegin 8 1.0 1.5738e-03 2.0 0.00e+00 0.0 1.7e+02 6.8e+04
> 1.6e+01 0 0 22 5 6 0 0 22 5 6 0
> MatAssemblyEnd 8 1.0 3.4761e-02 1.0 0.00e+00 0.0 2.8e+01 8.8e+02
> 2.3e+01 0 0 4 0 9 0 0 4 0 9 0
> MatGetRowIJ 1 1.0 2.4080e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 3.5882e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 8 1.0 3.6283e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 8 1.0 4.3712e+01 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
> 4.3e+01 91 0 44 31 17 91 0 44 31 17 0
> PCSetUp 8 1.0 4.2397e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.1e+01 89 0 0 0 4 89 0 0 0 4 0
> PCApply 8 1.0 1.3144e+00 1.0 0.00e+00 0.0 3.4e+02 2.0e+05
> 3.2e+01 3 0 44 31 12 3 0 44 31 12 0
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 46 43 49344760 0.
> Vector Scatter 23 22 18176 0.
> Index Set 37 37 3279764 0.
> IS L to G Mapping 1 0 0 0.
> Matrix 6 6 43498680 0.
> Krylov Solver 1 1 1160 0.
> Preconditioner 1 1 992 0.
> Viewer 1 0 0 0.
> ============================================================================
> ============================================
> Average time to get PetscTime(): 5.00679e-07
> Average time for MPI_Barrier(): 1.81198e-06
> Average time for zero size MPI_Send(): 3.75509e-06
>
> ////////////////////////////////////////////////////////////////////////////
> /////
> // HBT-HP5 : Small Complex Shape with Complex Interfaces
> // 5.3% PetscCommDuplicate() overhead in CrayPat sampling.
> ////////////////////////////////////////////////////////////////////////////
> /////
> ...50564 vertices...45420 elements (small size)
>
> Total run time was 4.863 seconds.
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> Using Petsc Release Version 3.7.2, Jun, 05, 2016
>
> Max Max/Min Avg Total
> Time (sec): 4.930e+00 1.00841 4.894e+00
> Objects: 1.080e+02 1.00000 1.080e+02
> Flops: 7.539e+04 1.06240 7.267e+04 5.813e+05
> Flops/sec: 1.542e+04 1.06244 1.485e+04 1.188e+05
> MPI Messages: 2.070e+02 2.43529 1.628e+02 1.302e+03
> MPI Message Lengths: 7.824e+06 2.40897 2.965e+04 3.861e+07
> MPI Reductions: 2.320e+02 1.00000
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total
> Avg %Total counts %Total
> 0: Main Stage: 4.8945e+00 100.0% 5.8133e+05 100.0% 1.302e+03 100.0%
> 2.965e+04 100.0% 2.310e+02 99.6%
>
> ----------------------------------------------------------------------------
> --------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMax 6 1.0 1.1725e-0321.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00 0 0 0 0 3 0 0 0 0 3 0
> VecNorm 6 1.0 2.2173e-02 1.0 3.77e+04 1.1 0.0e+00 0.0e+00
> 6.0e+00 0 50 0 0 3 0 50 0 0 3 13
> VecCopy 6 1.0 3.1948e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 27 1.0 4.0102e-04 4.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAYPX 6 1.0 6.3896e-05 1.2 3.77e+04 1.1 0.0e+00 0.0e+00
> 0.0e+00 0 50 0 0 0 0 50 0 0 0 4549
> VecAssemblyBegin 24 1.0 5.7409e-02 1.7 0.00e+00 0.0 3.1e+02 3.7e+04
> 7.2e+01 1 0 24 30 31 1 0 24 30 31 0
> VecAssemblyEnd 24 1.0 4.1070e-0322.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 33 1.0 6.2511e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.3e+01 0 0 0 13 14 0 0 0 13 14 0
> VecScatterEnd 15 1.0 2.7657e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatSolve 6 1.0 1.3485e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
> 2.4e+01 3 0 45 9 10 3 0 45 9 10 0
> MatLUFactorSym 1 1.0 6.4367e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 13 0 0 0 2 13 0 0 0 2 0
> MatLUFactorNum 6 1.0 3.5177e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 72 0 0 0 0 72 0 0 0 0 0
> MatAssemblyBegin 6 1.0 3.6111e-03 2.0 0.00e+00 0.0 3.1e+02 6.8e+04
> 1.2e+01 0 0 24 54 5 0 0 24 54 5 0
> MatAssemblyEnd 6 1.0 2.2925e-02 1.1 0.00e+00 0.0 6.8e+01 8.8e+02
> 1.9e+01 0 0 5 0 8 0 0 5 0 8 0
> MatGetRowIJ 1 1.0 4.0531e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 4.3869e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 6 1.0 1.6685e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 6 1.0 4.3020e+00 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
> 3.5e+01 88 0 45 9 15 88 0 45 9 15 0
> PCSetUp 6 1.0 4.1670e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.1e+01 85 0 0 0 5 85 0 0 0 5 0
> PCApply 6 1.0 1.3486e-01 1.0 0.00e+00 0.0 5.9e+02 5.8e+03
> 2.4e+01 3 0 45 9 10 3 0 45 9 10 0
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 44 41 3635912 0.
> Vector Scatter 21 20 90672 0.
> Index Set 33 33 286872 0.
> IS L to G Mapping 1 0 0 0.
> Matrix 6 6 2988180 0.
> Krylov Solver 1 1 1160 0.
> Preconditioner 1 1 992 0.
> Viewer 1 0 0 0.
> ============================================================================
> ============================================
> Average time to get PetscTime(): 5.00679e-07
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 3.12924e-06
>
> ////////////////////////////////////////////////////////////////////////////
> /////
> // GaNSi13 : Large Complex Shape with Complex Interfaces
> // 6.4% PetscCommDuplicate() overhead in CrayPat sampling.
> ////////////////////////////////////////////////////////////////////////////
> /////
> ...1642311 vertices...1497368 elements (large size)
>
> Total run time was 260.958 seconds.
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> Using Petsc Release Version 3.7.2, Jun, 05, 2016
>
> Max Max/Min Avg Total
> Time (sec): 2.619e+02 1.00319 2.611e+02
> Objects: 1.040e+02 1.00000 1.040e+02
> Flops: 2.050e+06 1.07341 1.969e+06 1.575e+07
> Flops/sec: 7.853e+03 1.07354 7.541e+03 6.032e+04
> MPI Messages: 1.835e+02 1.47390 1.448e+02 1.158e+03
> MPI Message Lengths: 1.761e+08 3.47614 4.801e+05 5.560e+08
> MPI Reductions: 2.180e+02 1.00000
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total
> Avg %Total counts %Total
> 0: Main Stage: 2.6114e+02 100.0% 1.5754e+07 100.0% 1.158e+03 100.0%
> 4.801e+05 100.0% 2.170e+02 99.5%
>
> ----------------------------------------------------------------------------
> --------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMax 5 1.0 5.5013e-03 7.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 0 0 0 0 2 0 0 0 0 2 0
> VecNorm 5 1.0 1.8921e-03 1.0 1.02e+06 1.1 0.0e+00 0.0e+00
> 5.0e+00 0 50 0 0 2 0 50 0 0 2 4163
> VecCopy 5 1.0 9.0218e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 25 1.0 2.4175e-0210.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAYPX 5 1.0 1.2169e-03 1.1 1.02e+06 1.1 0.0e+00 0.0e+00
> 0.0e+00 0 50 0 0 0 0 50 0 0 0 6473
> VecAssemblyBegin 23 1.0 2.6960e+00 1.6 0.00e+00 0.0 3.2e+02 9.2e+05
> 6.9e+01 1 0 28 54 32 1 0 28 54 32 0
> VecAssemblyEnd 23 1.0 1.2512e-0111.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 30 1.0 1.3994e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+01 0 0 0 19 14 0 0 0 19 14 0
> VecScatterEnd 13 1.0 4.6802e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatSolve 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
> 2.0e+01 1 0 32 15 9 1 0 32 15 9 0
> MatLUFactorSym 1 1.0 2.8861e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 5.0e+00 11 0 0 0 2 11 0 0 0 2 0
> MatLUFactorNum 5 1.0 1.9893e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 76 0 0 0 0 76 0 0 0 0 0
> MatAssemblyBegin 5 1.0 1.6689e-02 2.9 0.00e+00 0.0 3.3e+02 3.7e+05
> 1.0e+01 0 0 28 22 5 0 0 28 22 5 0
> MatAssemblyEnd 5 1.0 5.6672e+00 1.0 0.00e+00 0.0 9.2e+01 4.5e+03
> 1.7e+01 2 0 8 0 8 2 0 8 0 8 0
> MatGetRowIJ 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 7.0381e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 5 1.0 7.0859e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 5 1.0 2.3079e+02 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
> 3.1e+01 88 0 32 15 14 88 0 32 15 14 0
> PCSetUp 5 1.0 2.2781e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.1e+01 87 0 0 0 5 87 0 0 0 5 0
> PCApply 5 1.0 2.9838e+00 1.0 0.00e+00 0.0 3.7e+02 2.3e+05
> 2.0e+01 1 0 32 15 9 1 0 32 15 9 0
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 43 40 114480760 0.
> Vector Scatter 20 19 291760 0.
> Index Set 31 31 5440808 0.
> IS L to G Mapping 1 0 0 0.
> Matrix 6 6 99689528 0.
> Krylov Solver 1 1 1160 0.
> Preconditioner 1 1 992 0.
> Viewer 1 0 0 0.
> ============================================================================
> ============================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 1.81198e-06
> Average time for zero size MPI_Send(): 3.75509e-06
>
> -----Original Message-----
> From: Jed Brown [mailto:jed at jedbrown.org]
> Sent: Tuesday, October 11, 2016 5:19 PM
> To: overholt at capesim.com; 'Barry Smith'
> Cc: 'PETSc'
> Subject: Re: [petsc-users] large PetscCommDuplicate overhead
>
> Matthew Overholt <overholt at capesim.com> writes:
>
>> Barry,
>>
>> Subsequent tests with the same code and a problem (input) having a
>> much smaller vertex (equation) count (i.e. a much smaller matrix to
>> invert for the solution) have NOT had PetscCommDuplicate() account for
>> any significant time, so I'm not surprised that your test didn't find any
> problem.
>
> Can you re-run the large and small configurations with the same
> code/environment and resend those logs? PetscCommDuplicate has nothing to
> do with the problem size, so any difference in cost must be indirect, though
> attribute access should be simple and independent.
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
More information about the petsc-users
mailing list