hypre preconditioners

Klaij, Christiaan C.Klaij at marin.nl
Thu Jul 16 09:20:17 CDT 2009


Barry,

Thanks for your suggestions, I especially like the idea of keeping the same preconditioner for several solves; that's definitely worth a try.

Chris


-----Original Message-----
Date: Wed, 15 Jul 2009 15:26:17 -0500
From: Barry Smith <bsmith at mcs.anl.gov>
Subject: Re: hypre preconditioners
To: PETSc users list <petsc-users at mcs.anl.gov>
Message-ID: <83E2B8C2-9475-45C6-A448-502114D4959D at mcs.anl.gov>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes


On Jul 15, 2009, at 11:23 AM, Lisandro Dalcin wrote:

> Did you try Block-Jacobi for the velocity problem?

    You can try -pc_type sor and it will run block Jacobi with one  
symmetric sweep of SOR for each iteration. This may be faster than  
your plain Jacobi.

> If the matrix of
> your presure problem changes in each solve (is this your case?) could
> you try to use ML? In my little experience, ML leads to lower setup
> times, but higher iteration counts (let say twice); perhaps it will be
> faster than BommerAMG for you use case.

    ML is worth trying.

    Also you might try "playing" with the various boomerAMG options. I  
don't know them in detail so cannot make suggestions, but the various  
ways of coarsening control how quickly the setup time is.

   Finally, if the matrix is not changing much for each new solve you  
can use the same boomerAMG preconditioner for several linear solves.  
Just use SAME_PRECONDITIONER as the argument to KSPSetOperators() and  
it will not create a new preconditioner until you call it with  
SAME_NONZERO_PATTERN. I am thinking this might work very well for you.

    Barry


>
>
> On Wed, Jul 15, 2009 at 5:58 AM, Klaij, Christiaan<C.Klaij at marin.nl>  
> wrote:
>> Barry,
>>
>> Thanks for your reply! Below is the information from KSPView and - 
>> log_summary for the three cases. Indeed PCSetUp takes much more  
>> time with the hypre preconditioners.
>>
>> Chris
>>
>> -----------------------------
>> --- Jacobi preconditioner ---
>> -----------------------------
>>
>> KSP Object:
>>  type: cg
>>  maximum iterations=500
>>  tolerances:  relative=0.05, absolute=1e-50, divergence=10000
>>  left preconditioning
>> PC Object:
>>  type: jacobi
>>  linear system matrix = precond matrix:
>>  Matrix Object:
>>    type=mpiaij, rows=256576, cols=256576
>>    total: nonzeros=1769552, allocated nonzeros=1769552
>>      not using I-node (on process 0) routines
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript  
>> -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance  
>> Summary: ----------------------------------------------
>>
>> ./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij  
>> Wed Jul 15 10:22:04 2009
>> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26  
>> CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>>
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           6.037e+02      1.00000   6.037e+02
>> Objects:              9.270e+02      1.00000   9.270e+02
>> Flops:                5.671e+10      1.00065   5.669e+10  1.134e+11
>> Flops/sec:            9.393e+07      1.00065   9.390e+07  1.878e+08
>> MPI Messages:         1.780e+04      1.00000   1.780e+04  3.561e+04
>> MPI Message Lengths:  5.239e+08      1.00000   2.943e+04  1.048e+09
>> MPI Reductions:       2.651e+04      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type  
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of  
>> length N --> 2N flops
>>                            and VecAXPY() for complex vectors of  
>> length N --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts    
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 6.0374e+02 100.0%  1.1338e+11 100.0%  3.561e 
>> +04 100.0%  2.943e+04      100.0%  5.302e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on  
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops/sec: Max - maximum over all processors
>>                       Ratio - ratio of maximum to minimum over all  
>> processors
>>   Mess: number of messages sent
>>   Avg. len: average message length
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with  
>> PetscLogStagePush() and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in  
>> this phase
>>      %M - percent messages in this phase     %L - percent message  
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max  
>> time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>>      ##########################################################
>>      #                                                        #
>>      #                          WARNING!!!                    #
>>      #                                                        #
>>      #   This code was run without the PreLoadBegin()         #
>>      #   macros. To get timing results we always recommend    #
>>      #   preloading. otherwise timing numbers may be          #
>>      #   meaningless.                                         #
>>      ##########################################################
>>
>>
>> Event                Count      Time (sec)     Flops/ 
>> sec                         --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot             31370 1.0 1.2887e+01 1.0 6.28e+08 1.0 0.0e+00  
>> 0.0e+00 3.1e+04  2 14  0  0 59   2 14  0  0 59  1249
>> VecNorm            16235 1.0 2.3343e+00 1.0 1.79e+09 1.0 0.0e+00  
>> 0.0e+00 1.6e+04  0  7  0  0 31   0  7  0  0 31  3569
>> VecCopy             1600 1.0 9.4822e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              3732 1.0 8.7824e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY            32836 1.0 1.9510e+01 1.0 4.34e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  3 15  0  0  0   3 15  0  0  0   864
>> VecAYPX            16701 1.0 7.4898e+00 1.0 5.73e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  1  8  0  0  0   1  8  0  0  0  1144
>> VecAssemblyBegin    1200 1.0 3.3916e-01 2.2 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 3.6e+03  0  0  0  0  7   0  0  0  0  7     0
>> VecAssemblyEnd      1200 1.0 1.6778e-03 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult   18301 1.0 1.4524e+01 1.0 1.62e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0   323
>> VecScatterBegin    17801 1.0 5.8999e-01 1.0 0.00e+00 0.0 3.6e+04  
>> 2.9e+04 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd      17801 1.0 3.3189e+00 2.2 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSetup             600 1.0 6.7541e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve             600 1.0 1.6520e+02 1.0 3.43e+08 1.0 3.6e+04  
>> 2.9e+04 4.8e+04 27100100100 90  27100100100 90   686
>> PCSetUp              600 1.0 4.4189e+00 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> PCApply            18301 1.0 1.4579e+01 1.0 1.62e+08 1.0 0.0e+00  
>> 0.0e+00 1.0e+00  2  4  0  0  0   2  4  0  0  0   322
>> MatMult            16235 1.0 9.3444e+01 1.0 2.86e+08 1.0 3.2e+04  
>> 2.9e+04 0.0e+00 15 47 91 91  0  15 47 91 91  0   570
>> MatMultTranspose    1566 1.0 8.8825e+00 1.0 3.12e+08 1.0 3.1e+03  
>> 2.9e+04 0.0e+00  1  5  9  9  0   1  5  9  9  0   624
>> MatAssemblyBegin     600 1.0 6.0139e-0125.2 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 1.2e+03  0  0  0  0  2   0  0  0  0  2     0
>> MatAssemblyEnd       600 1.0 2.5127e+00 1.0 0.00e+00 0.0 4.0e+00  
>> 1.5e+04 6.1e+02  0  0  0  0  1   0  0  0  0  1     0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions   Memory   
>> Descendants' Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>>           Index Set     4              4      30272     0
>>                 Vec   913            902  926180816     0
>>         Vec Scatter     2              0          0     0
>>       Krylov Solver     1              0          0     0
>>      Preconditioner     1              0          0     0
>>              Matrix     6              0          0     0
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> =====================================================================
>> Average time to get PetscTime(): 2.14577e-07
>> Average time for MPI_Barrier(): 8.10623e-07
>> Average time for zero size MPI_Send(): 2.0504e-05
>>
>>
>>
>> -----------------------------------
>> --- Hypre Euclid preconditioner ---
>> -----------------------------------
>>
>> KSP Object:
>>  type: cg
>>  maximum iterations=500
>>  tolerances:  relative=0.05, absolute=1e-50, divergence=10000
>>  left preconditioning
>> PC Object:
>>  type: hypre
>>    HYPRE Euclid preconditioning
>>    HYPRE Euclid: number of levels 1
>>  linear system matrix = precond matrix:
>>  Matrix Object:
>>    type=mpiaij, rows=256576, cols=256576
>>    total: nonzeros=1769552, allocated nonzeros=1769552
>>      not using I-node (on process 0) routines
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript  
>> -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance  
>> Summary: ----------------------------------------------
>>
>> ./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij  
>> Wed Jul 15 10:10:05 2009
>> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26  
>> CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>>
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           6.961e+02      1.00000   6.961e+02
>> Objects:              1.227e+03      1.00000   1.227e+03
>> Flops:                1.340e+10      1.00073   1.340e+10  2.679e+10
>> Flops/sec:            1.925e+07      1.00073   1.924e+07  3.848e+07
>> MPI Messages:         4.748e+03      1.00000   4.748e+03  9.496e+03
>> MPI Message Lengths:  1.397e+08      1.00000   2.943e+04  2.794e+08
>> MPI Reductions:       7.192e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type  
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of  
>> length N --> 2N flops
>>                            and VecAXPY() for complex vectors of  
>> length N --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts    
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 6.9614e+02 100.0%  2.6790e+10 100.0%  9.496e 
>> +03 100.0%  2.943e+04      100.0%  1.438e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on  
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops/sec: Max - maximum over all processors
>>                       Ratio - ratio of maximum to minimum over all  
>> processors
>>   Mess: number of messages sent
>>   Avg. len: average message length
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with  
>> PetscLogStagePush() and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in  
>> this phase
>>      %M - percent messages in this phase     %L - percent message  
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max  
>> time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>>      ##########################################################
>>      #                                                        #
>>      #                          WARNING!!!                    #
>>      #                                                        #
>>      #   This code was run without the PreLoadBegin()         #
>>      #   macros. To get timing results we always recommend    #
>>      #   preloading. otherwise timing numbers may be          #
>>      #   meaningless.                                         #
>>      ##########################################################
>>
>>
>> Event                Count      Time (sec)     Flops/ 
>> sec                         --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot              5410 1.0 1.1865e+01 4.5 5.26e+08 4.5 0.0e+00  
>> 0.0e+00 5.4e+03  1 10  0  0 38   1 10  0  0 38   234
>> VecNorm             3255 1.0 7.8095e-01 1.0 1.07e+09 1.0 0.0e+00  
>> 0.0e+00 3.3e+03  0  6  0  0 23   0  6  0  0 23  2139
>> VecCopy             1600 1.0 9.5096e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              4746 1.0 8.9868e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY             6801 1.0 4.8778e+00 1.0 3.59e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  1 13  0  0  0   1 13  0  0  0   715
>> VecAYPX             3646 1.0 2.2348e+00 1.0 4.19e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0   837
>> VecAssemblyBegin    1200 1.0 2.7152e-01 2.5 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 3.6e+03  0  0  0  0 25   0  0  0  0 25     0
>> VecAssemblyEnd      1200 1.0 1.7414e-03 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    3982 1.0 4.0871e+00 1.0 1.26e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0   250
>> VecScatterBegin     4746 1.0 1.8000e-01 1.0 0.00e+00 0.0 9.5e+03  
>> 2.9e+04 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd       4746 1.0 4.6870e+00 5.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSetup             600 1.0 6.8991e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve             600 1.0 2.5931e+02 1.0 5.17e+07 1.0 9.5e+03  
>> 2.9e+04 9.0e+03 37100100100 62  37100100100 62   103
>> PCSetUp              600 1.0 1.8337e+02 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 2.0e+02 26  0  0  0  1  26  0  0  0  1     0
>> PCApply             5246 1.0 3.6440e+01 1.3 1.88e+07 1.3 0.0e+00  
>> 0.0e+00 1.0e+02  5  4  0  0  1   5  4  0  0  1    28
>> MatMult             3255 1.0 2.3031e+01 1.2 2.85e+08 1.2 6.5e+03  
>> 2.9e+04 0.0e+00  3 40 69 69  0   3 40 69 69  0   464
>> MatMultTranspose    1491 1.0 8.4907e+00 1.0 3.11e+08 1.0 3.0e+03  
>> 2.9e+04 0.0e+00  1 20 31 31  0   1 20 31 31  0   621
>> MatConvert           100 1.0 1.2686e+01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> MatAssemblyBegin     600 1.0 2.3702e+0042.6 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 1.2e+03  0  0  0  0  8   0  0  0  0  8     0
>> MatAssemblyEnd       600 1.0 2.5303e+00 1.0 0.00e+00 0.0 4.0e+00  
>> 1.5e+04 6.1e+02  0  0  0  0  4   0  0  0  0  4     0
>> MatGetRow        12828800 1.0 5.2074e+00 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetRowIJ          200 1.0 1.6284e-04 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions   Memory   
>> Descendants' Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>>           Index Set     4              4      30272     0
>>                 Vec  1213           1202  1234223216     0
>>         Vec Scatter     2              0          0     0
>>       Krylov Solver     1              0          0     0
>>      Preconditioner     1              0          0     0
>>              Matrix     6              0          0     0
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> =====================================================================
>> Average time to get PetscTime(): 2.14577e-07
>> Average time for MPI_Barrier(): 3.8147e-07
>> Average time for zero size MPI_Send(): 1.39475e-05
>>
>>
>>
>>
>> --------------------------------------
>> --- Hypre BoomerAMG preconditioner ---
>> --------------------------------------
>>
>> KSP Object:
>>  type: cg
>>  maximum iterations=500
>>  tolerances:  relative=0.05, absolute=1e-50, divergence=10000
>>  left preconditioning
>> PC Object:
>>  type: hypre
>>    HYPRE BoomerAMG preconditioning
>>    HYPRE BoomerAMG: Cycle type V
>>    HYPRE BoomerAMG: Maximum number of levels 25
>>    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>    HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>    HYPRE BoomerAMG: Interpolation truncation factor 0
>>    HYPRE BoomerAMG: Interpolation: max elements per row 0
>>    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>    HYPRE BoomerAMG: Maximum row sums 0.9
>>    HYPRE BoomerAMG: Sweeps down         1
>>    HYPRE BoomerAMG: Sweeps up           1
>>    HYPRE BoomerAMG: Sweeps on coarse    1
>>    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>    HYPRE BoomerAMG: Relax weight  (all)      1
>>    HYPRE BoomerAMG: Outer relax weight (all) 1
>>    HYPRE BoomerAMG: Using CF-relaxation
>>    HYPRE BoomerAMG: Measure type        local
>>    HYPRE BoomerAMG: Coarsen type        Falgout
>>    HYPRE BoomerAMG: Interpolation type  classical
>>  linear system matrix = precond matrix:
>>  Matrix Object:
>>    type=mpiaij, rows=256576, cols=256576
>>    total: nonzeros=1769552, allocated nonzeros=1769552
>>      not using I-node (on process 0) routines
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript  
>> -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance  
>> Summary: ----------------------------------------------
>>
>> ./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij  
>> Wed Jul 15 09:53:07 2009
>> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26  
>> CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>>
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           7.080e+02      1.00000   7.080e+02
>> Objects:              1.227e+03      1.00000   1.227e+03
>> Flops:                1.054e+10      1.00076   1.054e+10  2.107e+10
>> Flops/sec:            1.489e+07      1.00076   1.488e+07  2.977e+07
>> MPI Messages:         3.857e+03      1.00000   3.857e+03  7.714e+03
>> MPI Message Lengths:  1.135e+08      1.00000   2.942e+04  2.270e+08
>> MPI Reductions:       5.800e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type  
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of  
>> length N --> 2N flops
>>                            and VecAXPY() for complex vectors of  
>> length N --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts    
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 7.0799e+02 100.0%  2.1075e+10 100.0%  7.714e 
>> +03 100.0%  2.942e+04      100.0%  1.160e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on  
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops/sec: Max - maximum over all processors
>>                       Ratio - ratio of maximum to minimum over all  
>> processors
>>   Mess: number of messages sent
>>   Avg. len: average message length
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with  
>> PetscLogStagePush() and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in  
>> this phase
>>      %M - percent messages in this phase     %L - percent message  
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max  
>> time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>>      ##########################################################
>>      #                                                        #
>>      #                          WARNING!!!                    #
>>      #                                                        #
>>      #   This code was run without the PreLoadBegin()         #
>>      #   macros. To get timing results we always recommend    #
>>      #   preloading. otherwise timing numbers may be          #
>>      #   meaningless.                                         #
>>      ##########################################################
>>
>>
>> Event                Count      Time (sec)     Flops/ 
>> sec                         --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot              3554 1.0 1.8220e+00 1.0 5.03e+08 1.0 0.0e+00  
>> 0.0e+00 3.6e+03  0  9  0  0 31   0  9  0  0 31  1001
>> VecNorm             2327 1.0 6.7031e-01 1.0 9.34e+08 1.0 0.0e+00  
>> 0.0e+00 2.3e+03  0  6  0  0 20   0  6  0  0 20  1781
>> VecCopy             1600 1.0 9.4440e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              3855 1.0 8.0550e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY             4982 1.0 3.7953e+00 1.0 3.39e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  1 12  0  0  0   1 12  0  0  0   674
>> VecAYPX             2755 1.0 1.8270e+00 1.0 3.89e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0   774
>> VecAssemblyBegin    1200 1.0 1.8679e-01 1.8 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 3.6e+03  0  0  0  0 31   0  0  0  0 31     0
>> VecAssemblyEnd      1200 1.0 1.7717e-03 1.1 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult    4056 1.0 4.1344e+00 1.0 1.26e+08 1.0 0.0e+00  
>> 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0   252
>> VecScatterBegin     3855 1.0 1.5116e-01 1.0 0.00e+00 0.0 7.7e+03  
>> 2.9e+04 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd       3855 1.0 7.3828e-01 2.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSetup             600 1.0 5.1192e-01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve             600 1.0 2.7194e+02 1.0 3.88e+07 1.0 7.7e+03  
>> 2.9e+04 6.2e+03 38100100100 53  38100100100 53    77
>> PCSetUp              600 1.0 1.6630e+02 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 2.0e+02 23  0  0  0  2  23  0  0  0  2     0
>> PCApply             4355 1.0 7.3735e+01 1.0 7.06e+06 1.0 0.0e+00  
>> 0.0e+00 1.0e+02 10  5  0  0  1  10  5  0  0  1    14
>> MatMult             2327 1.0 1.3706e+01 1.0 2.79e+08 1.0 4.7e+03  
>> 2.9e+04 0.0e+00  2 36 60 60  0   2 36 60 60  0   557
>> MatMultTranspose    1528 1.0 8.6412e+00 1.0 3.13e+08 1.0 3.1e+03  
>> 2.9e+04 0.0e+00  1 26 40 40  0   1 26 40 40  0   626
>> MatConvert           100 1.0 1.2962e+01 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> MatAssemblyBegin     600 1.0 2.4579e+0096.9 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 1.2e+03  0  0  0  0 10   0  0  0  0 10     0
>> MatAssemblyEnd       600 1.0 2.5257e+00 1.0 0.00e+00 0.0 4.0e+00  
>> 1.5e+04 6.1e+02  0  0  0  0  5   0  0  0  0  5     0
>> MatGetRow        12828800 1.0 5.2907e+00 1.0 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetRowIJ          200 1.0 1.7476e-04 1.1 0.00e+00 0.0 0.0e+00  
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions   Memory   
>> Descendants' Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>>           Index Set     4              4      30272     0
>>                 Vec  1213           1202  1234223216     0
>>         Vec Scatter     2              0          0     0
>>       Krylov Solver     1              0          0     0
>>      Preconditioner     1              0          0     0
>>              Matrix     6              0          0     0
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> = 
>> =====================================================================
>> Average time to get PetscTime(): 1.90735e-07
>> Average time for MPI_Barrier(): 8.10623e-07
>> Average time for zero size MPI_Send(): 1.95503e-05
>> OptionTable: -log_summary
>>
>>
>>
>>
>> -----Original Message-----
>> Date: Tue, 14 Jul 2009 10:42:58 -0500
>> From: Barry Smith <bsmith at mcs.anl.gov>
>> Subject: Re: hypre preconditioners
>> To: PETSc users list <petsc-users at mcs.anl.gov>
>> Message-ID: <DC1E3E8F-1D2D-4256-A1EE-14BA81EAEC67 at mcs.anl.gov>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>>
>>
>>    First run the three cases with -log_summary (also -ksp_view to see
>> exact solver options that are being used) and send those files. This
>> will tell us where the time is being spent; without this information
>> any comments are pure speculation. (For example, the "copy" time to
>> hypre format is trivial compared to the time to build a hypre
>> preconditioner and not the problem).
>>
>>
>>    What you report is not uncommon; the setup and per iteration cost
>> of the hypre preconditioners will be much larger than the simpler
>> Jacobi preconditioner.
>>
>>    Barry
>>
>> On Jul 14, 2009, at 3:36 AM, Klaij, Christiaan wrote:
>>
>>>
>>> I'm solving the steady incompressible Navier-Stokes equations
>>> (discretized with FV on unstructured grids) using the SIMPLE
>>> Pressure Correction method. I'm using Picard linearization and solve
>>> the system for the momentum equations with BICG and for the pressure
>>> equation with CG. Currently, for parallel runs, I'm using JACOBI as
>>> a preconditioner. My grids typically have a few million cells and I
>>> use between 4 and 16 cores (1 to 4 quadcore CPUs on a linux
>>> cluster). A significant portion of the CPU time goes into solving
>>> the pressure equation. To reach the relative tolerance I need, CG
>>> with JACOBI takes about 100 iterations per outer loop for these
>>> problems.
>>>
>>> In order to reduce CPU time, I've compiled PETSc with support for
>>> Hypre and I'm looking at BoomerAMG and Euclid to replace JACOBI as a
>>> preconditioner for the pressure equation. With default settings,
>>> both BoomerAMG and Euclid greatly reduce the number of iterations:
>>> with BoomerAMG 1 or 2 iterations are enough, with Euclid about 10.
>>> However, I do not get any reduction in CPU time. With Euclid, CPU
>>> time is similar to JACOBI and with BoomerAMG it is approximately
>>> doubled.
>>>
>>> Is this what one can expect? Are BoomerAMG and Euclid meant for much
>>> larger problems? I understand Hypre uses a different matrix storage
>>> format, is CPU time 'lost in translation' between PETSc and Hypre
>>> for these small problems? Are there maybe any settings I should
>>> change?
>>>
>>> Chris
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <mime-attachment.jpeg><mime-attachment.jpeg>
>>> dr. ir. Christiaan Klaij
>>> CFD Researcher
>>> Research & Development
>>> MARIN
>>> 2, Haagsteeg
>>> c.klaij at marin.nl
>>> P.O. Box 28
>>> T +31 317 49 39 11
>>> 6700 AA  Wageningen
>>> F +31 317 49 32 45
>>> T  +31 317 49 33 44
>>> The Netherlands
>>> I  www.marin.nl
>>>
>>>
>>> MARIN webnews: First AMT'09 conference, Nantes, France, September  
>>> 1-2
>>>
>>>
>>> This e-mail may be confidential, privileged and/or protected by
>>> copyright. If you are not the intended recipient, you should return
>>> it to the sender immediately and delete your copy from your system.
>>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 15870 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090716/b9bfb2aa/attachment-0001.bin>


More information about the petsc-users mailing list