[petsc-users] Scaling/Preconditioners for Poisson equation
Matthew Knepley
knepley at gmail.com
Mon Sep 29 08:58:35 CDT 2014
On Mon, Sep 29, 2014 at 8:42 AM, Filippo Leonardi <
filippo.leonardi at sam.math.ethz.ch> wrote:
> Hi,
>
> I am trying to solve a standard second order central differenced Poisson
> equation in parallel, in 3D, using a 3D structured DMDAs (extremely
> standard
> Laplacian matrix).
>
> I want to get some nice scaling (especially weak), but my results show that
> the Krylow method is not performing as expected. The problem (at leas for
> CG +
> Bjacobi) seems to lie on the number of iterations.
>
> In particular the number of iterations grows with CG (the matrix is SPD) +
> BJacobi as mesh is refined (probably due to condition number increasing)
> and
> number of processors is increased (probably due to the Bjacobi
> preconditioner). For instance I tried the following setup:
> 1 procs to solve 32^3 domain => 20 iterations
> 8 procs to solve 64^3 domain => 60 iterations
> 64 procs to solve 128^3 domain => 101 iterations
>
> Is there something pathological with my runs (maybe I am missing
> something)?
> Is there somebody who can provide me weak scaling benchmarks for equivalent
> problems? (Maybe there is some better preconditioner for this problem).
>
Bjacobi is not a scalable preconditioner. As you note, the number of
iterates grows
with the system size. You should always use MG here.
> I am also aware that Multigrid is even better for this problems but the
> **scalability** of my runs seems to be as bad as with CG.
>
MG will weak scale almost perfectly. Send -log_summary for each run if this
does not happen.
Thanks,
Matt
> -pc_mg_galerkin
> -pc_type mg
> (both directly with richardson or as preconditioner to cg)
>
> The following is the "-log_summary" of a 128^3 run, notice that I solve the
> system multiple times (hence KSPSolve is multiplied by 128). Using CG +
> BJacobi.
>
> Tell me if I missed some detail and sorry for the length of the post.
>
> Thanks,
> Filippo
>
> Using Petsc Release Version 3.3.0, Patch 3, Wed Aug 29 11:26:24 CDT 2012
>
> Max Max/Min Avg Total
> Time (sec): 9.095e+01 1.00001 9.095e+01
> Objects: 1.875e+03 1.00000 1.875e+03
> Flops: 1.733e+10 1.00000 1.733e+10 1.109e+12
> Flops/sec: 1.905e+08 1.00001 1.905e+08 1.219e+10
> MPI Messages: 1.050e+05 1.00594 1.044e+05 6.679e+06
> MPI Message Lengths: 1.184e+09 1.37826 8.283e+03 5.532e+10
> MPI Reductions: 4.136e+04 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> -->
> 2N flops
> and VecAXPY() for complex vectors of length N
> -->
> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total
> Avg %Total counts %Total
> 0: Main Stage: 1.1468e-01 0.1% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 1: StepStage: 4.4170e-01 0.5% 7.2478e+09 0.7% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 2: ConvStage: 8.8333e+00 9.7% 3.7044e+10 3.3% 1.475e+06 22.1%
> 1.809e+03 21.8% 0.000e+00 0.0%
> 3: ProjStage: 7.7169e+01 84.8% 1.0556e+12 95.2% 5.151e+06 77.1%
> 6.317e+03 76.3% 4.024e+04 97.3%
> 4: IoStage: 2.4789e+00 2.7% 0.0000e+00 0.0% 3.564e+03 0.1%
> 1.017e+02 1.2% 5.000e+01 0.1%
> 5: SolvAlloc: 7.0947e-01 0.8% 0.0000e+00 0.0% 5.632e+03 0.1%
> 9.587e-01 0.0% 3.330e+02 0.8%
> 6: SolvSolve: 1.2044e+00 1.3% 9.1679e+09 0.8% 4.454e+04 0.7%
> 5.464e+01 0.7% 7.320e+02 1.8%
> 7: SolvDeall: 7.5711e-04 0.0% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting
> output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in
> this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all
> processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
>
> --- Event Stage 1: StepStage
>
> VecAXPY 1536 1.0 4.6436e-01 1.1 1.13e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 99100 0 0 0 15608
>
> --- Event Stage 2: ConvStage
>
> VecCopy 2304 1.0 8.1658e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 9 0 0 0 0 0
> VecAXPY 2304 1.0 6.1324e-01 1.2 1.51e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 6 26 0 0 0 15758
> VecAXPBYCZ 2688 1.0 1.3029e+00 1.1 3.52e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 2 0 0 0 14 61 0 0 0 17306
> VecPointwiseMult 2304 1.0 7.2368e-01 1.0 7.55e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 8 13 0 0 0 6677
> VecScatterBegin 3840 1.0 1.8182e+00 1.3 0.00e+00 0.0 1.5e+06 8.2e+03
> 0.0e+00 2 0 22 22 0 18 0100100 0 0
> VecScatterEnd 3840 1.0 1.1972e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 10 0 0 0 0 0
>
> --- Event Stage 3: ProjStage
>
> VecTDot 25802 1.0 4.2552e+00 1.3 1.69e+09 1.0 0.0e+00 0.0e+00
> 2.6e+04 4 10 0 0 62 5 10 0 0 64 25433
> VecNorm 13029 1.0 3.0772e+00 3.3 8.54e+08 1.0 0.0e+00 0.0e+00
> 1.3e+04 2 5 0 0 32 2 5 0 0 32 17759
> VecCopy 640 1.0 2.4339e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 13157 1.0 7.0903e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 26186 1.0 4.1462e+00 1.1 1.72e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 4 10 0 0 0 5 10 0 0 0 26490
> VecAYPX 12773 1.0 1.9135e+00 1.1 8.37e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 5 0 0 0 2 5 0 0 0 27997
> VecScatterBegin 13413 1.0 1.0689e+00 1.1 0.00e+00 0.0 5.2e+06 8.2e+03
> 0.0e+00 1 0 77 76 0 1 0100100 0 0
> VecScatterEnd 13413 1.0 2.7944e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
> MatMult 12901 1.0 3.2072e+01 1.0 5.92e+09 1.0 5.0e+06 8.2e+03
> 0.0e+00 35 34 74 73 0 41 36 96 96 0 11810
> MatSolve 13029 1.0 3.0851e+01 1.1 5.39e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 33 31 0 0 0 39 33 0 0 0 11182
> MatLUFactorNum 128 1.0 1.2922e+00 1.0 8.80e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 2 1 0 0 0 4358
> MatILUFactorSym 128 1.0 7.5075e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.3e+02 1 0 0 0 0 1 0 0 0 0 0
> MatGetRowIJ 128 1.0 1.4782e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 128 1.0 5.7567e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.6e+02 0 0 0 0 1 0 0 0 0 1 0
> KSPSetUp 256 1.0 1.9913e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 7.7e+02 0 0 0 0 2 0 0 0 0 2 0
> KSPSolve 128 1.0 7.6381e+01 1.0 1.65e+10 1.0 5.0e+06 8.2e+03
> 4.0e+04 84 95 74 73 97 99100 96 96100 13800
> PCSetUp 256 1.0 2.1503e+00 1.0 8.80e+07 1.0 0.0e+00 0.0e+00
> 6.4e+02 2 1 0 0 2 3 1 0 0 2 2619
> PCSetUpOnBlocks 128 1.0 2.1232e+00 1.0 8.80e+07 1.0 0.0e+00 0.0e+00
> 3.8e+02 2 1 0 0 1 3 1 0 0 1 2652
> PCApply 13029 1.0 3.1812e+01 1.1 5.39e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 34 31 0 0 0 40 33 0 0 0 10844
>
> --- Event Stage 4: IoStage
>
> VecView 10 1.0 1.7523e+00282.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+01 1 0 0 0 0 36 0 0 0 40 0
> VecCopy 10 1.0 2.2449e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 6 1.0 2.3620e-03 2.4 0.00e+00 0.0 2.3e+03 8.2e+03
> 0.0e+00 0 0 0 0 0 0 0 65 3 0 0
> VecScatterEnd 6 1.0 4.4194e-01663.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 9 0 0 0 0 0
>
> --- Event Stage 5: SolvAlloc
>
> VecSet 50 1.0 1.3170e-01 5.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 13 0 0 0 0 0
> MatAssemblyBegin 4 1.0 3.9801e-0230.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 8.0e+00 0 0 0 0 0 3 0 0 0 2 0
> MatAssemblyEnd 4 1.0 2.2752e-02 1.0 0.00e+00 0.0 1.5e+03 2.0e+03
> 1.6e+01 0 0 0 0 0 3 0 27 49 5 0
>
> --- Event Stage 6: SolvSolve
>
> VecTDot 224 1.0 3.5454e-02 1.3 1.47e+07 1.0 0.0e+00 0.0e+00
> 2.2e+02 0 0 0 0 1 3 10 0 0 31 26499
> VecNorm 497 1.0 1.5268e-01 1.4 7.41e+06 1.0 0.0e+00 0.0e+00
> 5.0e+02 0 0 0 0 1 11 5 0 0 68 3104
> VecCopy 8 1.0 2.7523e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 114 1.0 5.9965e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 230 1.0 3.7198e-02 1.1 1.51e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 3 11 0 0 0 25934
> VecAYPX 111 1.0 1.7153e-02 1.1 7.27e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 1 5 0 0 0 27142
> VecScatterBegin 116 1.0 1.1888e-02 1.2 0.00e+00 0.0 4.5e+04 8.2e+03
> 0.0e+00 0 0 1 1 0 1 0100100 0 0
> VecScatterEnd 116 1.0 2.8105e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
> MatMult 112 1.0 2.8080e-01 1.0 5.14e+07 1.0 4.3e+04 8.2e+03
> 0.0e+00 0 0 1 1 0 23 36 97 97 0 11711
> MatSolve 113 1.0 2.6673e-01 1.1 4.67e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 22 33 0 0 0 11217
> MatLUFactorNum 1 1.0 1.0332e-02 1.0 6.87e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 1 0 0 0 0 4259
> MatILUFactorSym 1 1.0 3.1291e-02 4.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 0 0 0 0 0 2 0 0 0 0 0
> MatGetRowIJ 1 1.0 4.0531e-06 4.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 3.4251e-03 5.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 2 1.0 3.6959e-0210.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00 0 0 0 0 0 1 0 0 0 1 0
> KSPSolve 1 1.0 6.9956e-01 1.0 1.43e+08 1.0 4.3e+04 8.2e+03
> 3.5e+02 1 1 1 1 1 58100 97 97 48 13069
> PCSetUp 2 1.0 4.4161e-02 2.3 6.87e+05 1.0 0.0e+00 0.0e+00
> 5.0e+00 0 0 0 0 0 3 0 0 0 1 996
> PCSetUpOnBlocks 1 1.0 4.3894e-02 2.4 6.87e+05 1.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 3 0 0 0 0 1002
> PCApply 113 1.0 2.7507e-01 1.1 4.67e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 22 33 0 0 0 10877
>
> --- Event Stage 7: SolvDeall
>
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Viewer 1 0 0 0
>
> --- Event Stage 1: StepStage
>
>
> --- Event Stage 2: ConvStage
>
>
> --- Event Stage 3: ProjStage
>
> Vector 640 640 101604352 0
> Matrix 128 128 410327040 0
> Index Set 384 384 17062912 0
> Krylov Solver 256 256 282624 0
> Preconditioner 256 256 228352 0
>
> --- Event Stage 4: IoStage
>
> Vector 10 10 2636400 0
> Viewer 10 10 6880 0
>
> --- Event Stage 5: SolvAlloc
>
> Vector 140 6 8848 0
> Vector Scatter 6 0 0 0
> Matrix 6 0 0 0
> Distributed Mesh 2 0 0 0
> Bipartite Graph 4 0 0 0
> Index Set 14 14 372400 0
> IS L to G Mapping 3 0 0 0
> Krylov Solver 1 0 0 0
> Preconditioner 1 0 0 0
>
> --- Event Stage 6: SolvSolve
>
> Vector 5 0 0 0
> Matrix 1 0 0 0
> Index Set 3 0 0 0
> Krylov Solver 2 1 1136 0
> Preconditioner 2 1 824 0
>
> --- Event Stage 7: SolvDeall
>
> Vector 0 133 36676728 0
> Vector Scatter 0 1 1036 0
> Matrix 0 4 7038924 0
> Index Set 0 3 133304 0
> Krylov Solver 0 2 2208 0
> Preconditioner 0 2 1784 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 1.12057e-05
> Average time for zero size MPI_Send(): 1.3113e-06
> #PETSc Option Table entries:
> -ksp_type cg
> -log_summary
> -pc_type bjacobi
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at:
> Configure options:
> Application 9457215 resources: utime ~5920s, stime ~58s
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140929/d8166c56/attachment-0001.html>
More information about the petsc-users
mailing list