hypre preconditioners
Barry Smith
bsmith at mcs.anl.gov
Wed Jul 15 15:26:17 CDT 2009
On Jul 15, 2009, at 11:23 AM, Lisandro Dalcin wrote:
> Did you try Block-Jacobi for the velocity problem?
You can try -pc_type sor and it will run block Jacobi with one
symmetric sweep of SOR for each iteration. This may be faster than
your plain Jacobi.
> If the matrix of
> your presure problem changes in each solve (is this your case?) could
> you try to use ML? In my little experience, ML leads to lower setup
> times, but higher iteration counts (let say twice); perhaps it will be
> faster than BommerAMG for you use case.
ML is worth trying.
Also you might try "playing" with the various boomerAMG options. I
don't know them in detail so cannot make suggestions, but the various
ways of coarsening control how quickly the setup time is.
Finally, if the matrix is not changing much for each new solve you
can use the same boomerAMG preconditioner for several linear solves.
Just use SAME_PRECONDITIONER as the argument to KSPSetOperators() and
it will not create a new preconditioner until you call it with
SAME_NONZERO_PATTERN. I am thinking this might work very well for you.
Barry
>
>
> On Wed, Jul 15, 2009 at 5:58 AM, Klaij, Christiaan<C.Klaij at marin.nl>
> wrote:
>> Barry,
>>
>> Thanks for your reply! Below is the information from KSPView and -
>> log_summary for the three cases. Indeed PCSetUp takes much more
>> time with the hypre preconditioners.
>>
>> Chris
>>
>> -----------------------------
>> --- Jacobi preconditioner ---
>> -----------------------------
>>
>> KSP Object:
>> type: cg
>> maximum iterations=500
>> tolerances: relative=0.05, absolute=1e-50, divergence=10000
>> left preconditioning
>> PC Object:
>> type: jacobi
>> linear system matrix = precond matrix:
>> Matrix Object:
>> type=mpiaij, rows=256576, cols=256576
>> total: nonzeros=1769552, allocated nonzeros=1769552
>> not using I-node (on process 0) routines
>>
>> ************************************************************************************************************************
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript
>> -r -fCourier9' to print this document ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>>
>> ./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij
>> Wed Jul 15 10:22:04 2009
>> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26
>> CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>>
>> Max Max/Min Avg Total
>> Time (sec): 6.037e+02 1.00000 6.037e+02
>> Objects: 9.270e+02 1.00000 9.270e+02
>> Flops: 5.671e+10 1.00065 5.669e+10 1.134e+11
>> Flops/sec: 9.393e+07 1.00065 9.390e+07 1.878e+08
>> MPI Messages: 1.780e+04 1.00000 1.780e+04 3.561e+04
>> MPI Message Lengths: 5.239e+08 1.00000 2.943e+04 1.048e+09
>> MPI Reductions: 2.651e+04 1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of
>> length N --> 2N flops
>> and VecAXPY() for complex vectors of
>> length N --> 8N flops
>>
>> Summary of Stages: ----- Time ------ ----- Flops ----- ---
>> Messages --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total counts
>> %Total Avg %Total counts %Total
>> 0: Main Stage: 6.0374e+02 100.0% 1.1338e+11 100.0% 3.561e
>> +04 100.0% 2.943e+04 100.0% 5.302e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flops/sec: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all
>> processors
>> Mess: number of messages sent
>> Avg. len: average message length
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with
>> PetscLogStagePush() and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flops in
>> this phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
>> time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>> ##########################################################
>> # #
>> # WARNING!!! #
>> # #
>> # This code was run without the PreLoadBegin() #
>> # macros. To get timing results we always recommend #
>> # preloading. otherwise timing numbers may be #
>> # meaningless. #
>> ##########################################################
>>
>>
>> Event Count Time (sec) Flops/
>> sec --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg
>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot 31370 1.0 1.2887e+01 1.0 6.28e+08 1.0 0.0e+00
>> 0.0e+00 3.1e+04 2 14 0 0 59 2 14 0 0 59 1249
>> VecNorm 16235 1.0 2.3343e+00 1.0 1.79e+09 1.0 0.0e+00
>> 0.0e+00 1.6e+04 0 7 0 0 31 0 7 0 0 31 3569
>> VecCopy 1600 1.0 9.4822e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 3732 1.0 8.7824e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 32836 1.0 1.9510e+01 1.0 4.34e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 3 15 0 0 0 3 15 0 0 0 864
>> VecAYPX 16701 1.0 7.4898e+00 1.0 5.73e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 1 8 0 0 0 1 8 0 0 0 1144
>> VecAssemblyBegin 1200 1.0 3.3916e-01 2.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 3.6e+03 0 0 0 0 7 0 0 0 0 7 0
>> VecAssemblyEnd 1200 1.0 1.6778e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecPointwiseMult 18301 1.0 1.4524e+01 1.0 1.62e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 323
>> VecScatterBegin 17801 1.0 5.8999e-01 1.0 0.00e+00 0.0 3.6e+04
>> 2.9e+04 0.0e+00 0 0100100 0 0 0100100 0 0
>> VecScatterEnd 17801 1.0 3.3189e+00 2.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSetup 600 1.0 6.7541e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 600 1.0 1.6520e+02 1.0 3.43e+08 1.0 3.6e+04
>> 2.9e+04 4.8e+04 27100100100 90 27100100100 90 686
>> PCSetUp 600 1.0 4.4189e+00 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> PCApply 18301 1.0 1.4579e+01 1.0 1.62e+08 1.0 0.0e+00
>> 0.0e+00 1.0e+00 2 4 0 0 0 2 4 0 0 0 322
>> MatMult 16235 1.0 9.3444e+01 1.0 2.86e+08 1.0 3.2e+04
>> 2.9e+04 0.0e+00 15 47 91 91 0 15 47 91 91 0 570
>> MatMultTranspose 1566 1.0 8.8825e+00 1.0 3.12e+08 1.0 3.1e+03
>> 2.9e+04 0.0e+00 1 5 9 9 0 1 5 9 9 0 624
>> MatAssemblyBegin 600 1.0 6.0139e-0125.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 1.2e+03 0 0 0 0 2 0 0 0 0 2 0
>> MatAssemblyEnd 600 1.0 2.5127e+00 1.0 0.00e+00 0.0 4.0e+00
>> 1.5e+04 6.1e+02 0 0 0 0 1 0 0 0 0 1 0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type Creations Destructions Memory
>> Descendants' Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>> Index Set 4 4 30272 0
>> Vec 913 902 926180816 0
>> Vec Scatter 2 0 0 0
>> Krylov Solver 1 0 0 0
>> Preconditioner 1 0 0 0
>> Matrix 6 0 0 0
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =====================================================================
>> Average time to get PetscTime(): 2.14577e-07
>> Average time for MPI_Barrier(): 8.10623e-07
>> Average time for zero size MPI_Send(): 2.0504e-05
>>
>>
>>
>> -----------------------------------
>> --- Hypre Euclid preconditioner ---
>> -----------------------------------
>>
>> KSP Object:
>> type: cg
>> maximum iterations=500
>> tolerances: relative=0.05, absolute=1e-50, divergence=10000
>> left preconditioning
>> PC Object:
>> type: hypre
>> HYPRE Euclid preconditioning
>> HYPRE Euclid: number of levels 1
>> linear system matrix = precond matrix:
>> Matrix Object:
>> type=mpiaij, rows=256576, cols=256576
>> total: nonzeros=1769552, allocated nonzeros=1769552
>> not using I-node (on process 0) routines
>>
>> ************************************************************************************************************************
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript
>> -r -fCourier9' to print this document ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>>
>> ./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij
>> Wed Jul 15 10:10:05 2009
>> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26
>> CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>>
>> Max Max/Min Avg Total
>> Time (sec): 6.961e+02 1.00000 6.961e+02
>> Objects: 1.227e+03 1.00000 1.227e+03
>> Flops: 1.340e+10 1.00073 1.340e+10 2.679e+10
>> Flops/sec: 1.925e+07 1.00073 1.924e+07 3.848e+07
>> MPI Messages: 4.748e+03 1.00000 4.748e+03 9.496e+03
>> MPI Message Lengths: 1.397e+08 1.00000 2.943e+04 2.794e+08
>> MPI Reductions: 7.192e+03 1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of
>> length N --> 2N flops
>> and VecAXPY() for complex vectors of
>> length N --> 8N flops
>>
>> Summary of Stages: ----- Time ------ ----- Flops ----- ---
>> Messages --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total counts
>> %Total Avg %Total counts %Total
>> 0: Main Stage: 6.9614e+02 100.0% 2.6790e+10 100.0% 9.496e
>> +03 100.0% 2.943e+04 100.0% 1.438e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flops/sec: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all
>> processors
>> Mess: number of messages sent
>> Avg. len: average message length
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with
>> PetscLogStagePush() and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flops in
>> this phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
>> time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>> ##########################################################
>> # #
>> # WARNING!!! #
>> # #
>> # This code was run without the PreLoadBegin() #
>> # macros. To get timing results we always recommend #
>> # preloading. otherwise timing numbers may be #
>> # meaningless. #
>> ##########################################################
>>
>>
>> Event Count Time (sec) Flops/
>> sec --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg
>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot 5410 1.0 1.1865e+01 4.5 5.26e+08 4.5 0.0e+00
>> 0.0e+00 5.4e+03 1 10 0 0 38 1 10 0 0 38 234
>> VecNorm 3255 1.0 7.8095e-01 1.0 1.07e+09 1.0 0.0e+00
>> 0.0e+00 3.3e+03 0 6 0 0 23 0 6 0 0 23 2139
>> VecCopy 1600 1.0 9.5096e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 4746 1.0 8.9868e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 6801 1.0 4.8778e+00 1.0 3.59e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 1 13 0 0 0 1 13 0 0 0 715
>> VecAYPX 3646 1.0 2.2348e+00 1.0 4.19e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 837
>> VecAssemblyBegin 1200 1.0 2.7152e-01 2.5 0.00e+00 0.0 0.0e+00
>> 0.0e+00 3.6e+03 0 0 0 0 25 0 0 0 0 25 0
>> VecAssemblyEnd 1200 1.0 1.7414e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecPointwiseMult 3982 1.0 4.0871e+00 1.0 1.26e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 250
>> VecScatterBegin 4746 1.0 1.8000e-01 1.0 0.00e+00 0.0 9.5e+03
>> 2.9e+04 0.0e+00 0 0100100 0 0 0100100 0 0
>> VecScatterEnd 4746 1.0 4.6870e+00 5.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSetup 600 1.0 6.8991e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 600 1.0 2.5931e+02 1.0 5.17e+07 1.0 9.5e+03
>> 2.9e+04 9.0e+03 37100100100 62 37100100100 62 103
>> PCSetUp 600 1.0 1.8337e+02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 2.0e+02 26 0 0 0 1 26 0 0 0 1 0
>> PCApply 5246 1.0 3.6440e+01 1.3 1.88e+07 1.3 0.0e+00
>> 0.0e+00 1.0e+02 5 4 0 0 1 5 4 0 0 1 28
>> MatMult 3255 1.0 2.3031e+01 1.2 2.85e+08 1.2 6.5e+03
>> 2.9e+04 0.0e+00 3 40 69 69 0 3 40 69 69 0 464
>> MatMultTranspose 1491 1.0 8.4907e+00 1.0 3.11e+08 1.0 3.0e+03
>> 2.9e+04 0.0e+00 1 20 31 31 0 1 20 31 31 0 621
>> MatConvert 100 1.0 1.2686e+01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
>> MatAssemblyBegin 600 1.0 2.3702e+0042.6 0.00e+00 0.0 0.0e+00
>> 0.0e+00 1.2e+03 0 0 0 0 8 0 0 0 0 8 0
>> MatAssemblyEnd 600 1.0 2.5303e+00 1.0 0.00e+00 0.0 4.0e+00
>> 1.5e+04 6.1e+02 0 0 0 0 4 0 0 0 0 4 0
>> MatGetRow 12828800 1.0 5.2074e+00 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> MatGetRowIJ 200 1.0 1.6284e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type Creations Destructions Memory
>> Descendants' Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>> Index Set 4 4 30272 0
>> Vec 1213 1202 1234223216 0
>> Vec Scatter 2 0 0 0
>> Krylov Solver 1 0 0 0
>> Preconditioner 1 0 0 0
>> Matrix 6 0 0 0
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =====================================================================
>> Average time to get PetscTime(): 2.14577e-07
>> Average time for MPI_Barrier(): 3.8147e-07
>> Average time for zero size MPI_Send(): 1.39475e-05
>>
>>
>>
>>
>> --------------------------------------
>> --- Hypre BoomerAMG preconditioner ---
>> --------------------------------------
>>
>> KSP Object:
>> type: cg
>> maximum iterations=500
>> tolerances: relative=0.05, absolute=1e-50, divergence=10000
>> left preconditioning
>> PC Object:
>> type: hypre
>> HYPRE BoomerAMG preconditioning
>> HYPRE BoomerAMG: Cycle type V
>> HYPRE BoomerAMG: Maximum number of levels 25
>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>> HYPRE BoomerAMG: Threshold for strong coupling 0.25
>> HYPRE BoomerAMG: Interpolation truncation factor 0
>> HYPRE BoomerAMG: Interpolation: max elements per row 0
>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>> HYPRE BoomerAMG: Maximum row sums 0.9
>> HYPRE BoomerAMG: Sweeps down 1
>> HYPRE BoomerAMG: Sweeps up 1
>> HYPRE BoomerAMG: Sweeps on coarse 1
>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>> HYPRE BoomerAMG: Relax weight (all) 1
>> HYPRE BoomerAMG: Outer relax weight (all) 1
>> HYPRE BoomerAMG: Using CF-relaxation
>> HYPRE BoomerAMG: Measure type local
>> HYPRE BoomerAMG: Coarsen type Falgout
>> HYPRE BoomerAMG: Interpolation type classical
>> linear system matrix = precond matrix:
>> Matrix Object:
>> type=mpiaij, rows=256576, cols=256576
>> total: nonzeros=1769552, allocated nonzeros=1769552
>> not using I-node (on process 0) routines
>>
>> ************************************************************************************************************************
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript
>> -r -fCourier9' to print this document ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>>
>> ./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij
>> Wed Jul 15 09:53:07 2009
>> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26
>> CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>>
>> Max Max/Min Avg Total
>> Time (sec): 7.080e+02 1.00000 7.080e+02
>> Objects: 1.227e+03 1.00000 1.227e+03
>> Flops: 1.054e+10 1.00076 1.054e+10 2.107e+10
>> Flops/sec: 1.489e+07 1.00076 1.488e+07 2.977e+07
>> MPI Messages: 3.857e+03 1.00000 3.857e+03 7.714e+03
>> MPI Message Lengths: 1.135e+08 1.00000 2.942e+04 2.270e+08
>> MPI Reductions: 5.800e+03 1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of
>> length N --> 2N flops
>> and VecAXPY() for complex vectors of
>> length N --> 8N flops
>>
>> Summary of Stages: ----- Time ------ ----- Flops ----- ---
>> Messages --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total counts
>> %Total Avg %Total counts %Total
>> 0: Main Stage: 7.0799e+02 100.0% 2.1075e+10 100.0% 7.714e
>> +03 100.0% 2.942e+04 100.0% 1.160e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flops/sec: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all
>> processors
>> Mess: number of messages sent
>> Avg. len: average message length
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with
>> PetscLogStagePush() and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flops in
>> this phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
>> time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>> ##########################################################
>> # #
>> # WARNING!!! #
>> # #
>> # This code was run without the PreLoadBegin() #
>> # macros. To get timing results we always recommend #
>> # preloading. otherwise timing numbers may be #
>> # meaningless. #
>> ##########################################################
>>
>>
>> Event Count Time (sec) Flops/
>> sec --- Global --- --- Stage --- Total
>> Max Ratio Max Ratio Max Ratio Mess Avg
>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot 3554 1.0 1.8220e+00 1.0 5.03e+08 1.0 0.0e+00
>> 0.0e+00 3.6e+03 0 9 0 0 31 0 9 0 0 31 1001
>> VecNorm 2327 1.0 6.7031e-01 1.0 9.34e+08 1.0 0.0e+00
>> 0.0e+00 2.3e+03 0 6 0 0 20 0 6 0 0 20 1781
>> VecCopy 1600 1.0 9.4440e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 3855 1.0 8.0550e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 4982 1.0 3.7953e+00 1.0 3.39e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 1 12 0 0 0 1 12 0 0 0 674
>> VecAYPX 2755 1.0 1.8270e+00 1.0 3.89e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 774
>> VecAssemblyBegin 1200 1.0 1.8679e-01 1.8 0.00e+00 0.0 0.0e+00
>> 0.0e+00 3.6e+03 0 0 0 0 31 0 0 0 0 31 0
>> VecAssemblyEnd 1200 1.0 1.7717e-03 1.1 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecPointwiseMult 4056 1.0 4.1344e+00 1.0 1.26e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 252
>> VecScatterBegin 3855 1.0 1.5116e-01 1.0 0.00e+00 0.0 7.7e+03
>> 2.9e+04 0.0e+00 0 0100100 0 0 0100100 0 0
>> VecScatterEnd 3855 1.0 7.3828e-01 2.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSetup 600 1.0 5.1192e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 600 1.0 2.7194e+02 1.0 3.88e+07 1.0 7.7e+03
>> 2.9e+04 6.2e+03 38100100100 53 38100100100 53 77
>> PCSetUp 600 1.0 1.6630e+02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 2.0e+02 23 0 0 0 2 23 0 0 0 2 0
>> PCApply 4355 1.0 7.3735e+01 1.0 7.06e+06 1.0 0.0e+00
>> 0.0e+00 1.0e+02 10 5 0 0 1 10 5 0 0 1 14
>> MatMult 2327 1.0 1.3706e+01 1.0 2.79e+08 1.0 4.7e+03
>> 2.9e+04 0.0e+00 2 36 60 60 0 2 36 60 60 0 557
>> MatMultTranspose 1528 1.0 8.6412e+00 1.0 3.13e+08 1.0 3.1e+03
>> 2.9e+04 0.0e+00 1 26 40 40 0 1 26 40 40 0 626
>> MatConvert 100 1.0 1.2962e+01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
>> MatAssemblyBegin 600 1.0 2.4579e+0096.9 0.00e+00 0.0 0.0e+00
>> 0.0e+00 1.2e+03 0 0 0 0 10 0 0 0 0 10 0
>> MatAssemblyEnd 600 1.0 2.5257e+00 1.0 0.00e+00 0.0 4.0e+00
>> 1.5e+04 6.1e+02 0 0 0 0 5 0 0 0 0 5 0
>> MatGetRow 12828800 1.0 5.2907e+00 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> MatGetRowIJ 200 1.0 1.7476e-04 1.1 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type Creations Destructions Memory
>> Descendants' Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>> Index Set 4 4 30272 0
>> Vec 1213 1202 1234223216 0
>> Vec Scatter 2 0 0 0
>> Krylov Solver 1 0 0 0
>> Preconditioner 1 0 0 0
>> Matrix 6 0 0 0
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =
>> =====================================================================
>> Average time to get PetscTime(): 1.90735e-07
>> Average time for MPI_Barrier(): 8.10623e-07
>> Average time for zero size MPI_Send(): 1.95503e-05
>> OptionTable: -log_summary
>>
>>
>>
>>
>> -----Original Message-----
>> Date: Tue, 14 Jul 2009 10:42:58 -0500
>> From: Barry Smith <bsmith at mcs.anl.gov>
>> Subject: Re: hypre preconditioners
>> To: PETSc users list <petsc-users at mcs.anl.gov>
>> Message-ID: <DC1E3E8F-1D2D-4256-A1EE-14BA81EAEC67 at mcs.anl.gov>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>>
>>
>> First run the three cases with -log_summary (also -ksp_view to see
>> exact solver options that are being used) and send those files. This
>> will tell us where the time is being spent; without this information
>> any comments are pure speculation. (For example, the "copy" time to
>> hypre format is trivial compared to the time to build a hypre
>> preconditioner and not the problem).
>>
>>
>> What you report is not uncommon; the setup and per iteration cost
>> of the hypre preconditioners will be much larger than the simpler
>> Jacobi preconditioner.
>>
>> Barry
>>
>> On Jul 14, 2009, at 3:36 AM, Klaij, Christiaan wrote:
>>
>>>
>>> I'm solving the steady incompressible Navier-Stokes equations
>>> (discretized with FV on unstructured grids) using the SIMPLE
>>> Pressure Correction method. I'm using Picard linearization and solve
>>> the system for the momentum equations with BICG and for the pressure
>>> equation with CG. Currently, for parallel runs, I'm using JACOBI as
>>> a preconditioner. My grids typically have a few million cells and I
>>> use between 4 and 16 cores (1 to 4 quadcore CPUs on a linux
>>> cluster). A significant portion of the CPU time goes into solving
>>> the pressure equation. To reach the relative tolerance I need, CG
>>> with JACOBI takes about 100 iterations per outer loop for these
>>> problems.
>>>
>>> In order to reduce CPU time, I've compiled PETSc with support for
>>> Hypre and I'm looking at BoomerAMG and Euclid to replace JACOBI as a
>>> preconditioner for the pressure equation. With default settings,
>>> both BoomerAMG and Euclid greatly reduce the number of iterations:
>>> with BoomerAMG 1 or 2 iterations are enough, with Euclid about 10.
>>> However, I do not get any reduction in CPU time. With Euclid, CPU
>>> time is similar to JACOBI and with BoomerAMG it is approximately
>>> doubled.
>>>
>>> Is this what one can expect? Are BoomerAMG and Euclid meant for much
>>> larger problems? I understand Hypre uses a different matrix storage
>>> format, is CPU time 'lost in translation' between PETSc and Hypre
>>> for these small problems? Are there maybe any settings I should
>>> change?
>>>
>>> Chris
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <mime-attachment.jpeg><mime-attachment.jpeg>
>>> dr. ir. Christiaan Klaij
>>> CFD Researcher
>>> Research & Development
>>> MARIN
>>> 2, Haagsteeg
>>> c.klaij at marin.nl
>>> P.O. Box 28
>>> T +31 317 49 39 11
>>> 6700 AA Wageningen
>>> F +31 317 49 32 45
>>> T +31 317 49 33 44
>>> The Netherlands
>>> I www.marin.nl
>>>
>>>
>>> MARIN webnews: First AMT'09 conference, Nantes, France, September
>>> 1-2
>>>
>>>
>>> This e-mail may be confidential, privileged and/or protected by
>>> copyright. If you are not the intended recipient, you should return
>>> it to the sender immediately and delete your copy from your system.
>>>
>>
>
>
>
> --
> Lisandro Dalcín
> ---------------
> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
More information about the petsc-users
mailing list