hypre preconditioners

Klaij, Christiaan C.Klaij at marin.nl
Wed Jul 15 03:58:36 CDT 2009


Barry,

Thanks for your reply! Below is the information from KSPView and -log_summary for the three cases. Indeed PCSetUp takes much more time with the hypre preconditioners. 

Chris

-----------------------------
--- Jacobi preconditioner ---
-----------------------------

KSP Object:
  type: cg
  maximum iterations=500
  tolerances:  relative=0.05, absolute=1e-50, divergence=10000
  left preconditioning
PC Object:
  type: jacobi
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpiaij, rows=256576, cols=256576
    total: nonzeros=1769552, allocated nonzeros=1769552
      not using I-node (on process 0) routines

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij Wed Jul 15 10:22:04 2009
Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26 CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124

                         Max       Max/Min        Avg      Total 
Time (sec):           6.037e+02      1.00000   6.037e+02
Objects:              9.270e+02      1.00000   9.270e+02
Flops:                5.671e+10      1.00065   5.669e+10  1.134e+11
Flops/sec:            9.393e+07      1.00065   9.390e+07  1.878e+08
MPI Messages:         1.780e+04      1.00000   1.780e+04  3.561e+04
MPI Message Lengths:  5.239e+08      1.00000   2.943e+04  1.048e+09
MPI Reductions:       2.651e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.0374e+02 100.0%  1.1338e+11 100.0%  3.561e+04 100.0%  2.943e+04      100.0%  5.302e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot             31370 1.0 1.2887e+01 1.0 6.28e+08 1.0 0.0e+00 0.0e+00 3.1e+04  2 14  0  0 59   2 14  0  0 59  1249
VecNorm            16235 1.0 2.3343e+00 1.0 1.79e+09 1.0 0.0e+00 0.0e+00 1.6e+04  0  7  0  0 31   0  7  0  0 31  3569
VecCopy             1600 1.0 9.4822e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              3732 1.0 8.7824e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            32836 1.0 1.9510e+01 1.0 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3 15  0  0  0   3 15  0  0  0   864
VecAYPX            16701 1.0 7.4898e+00 1.0 5.73e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  8  0  0  0   1  8  0  0  0  1144
VecAssemblyBegin    1200 1.0 3.3916e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+03  0  0  0  0  7   0  0  0  0  7     0
VecAssemblyEnd      1200 1.0 1.6778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   18301 1.0 1.4524e+01 1.0 1.62e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0   323
VecScatterBegin    17801 1.0 5.8999e-01 1.0 0.00e+00 0.0 3.6e+04 2.9e+04 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd      17801 1.0 3.3189e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup             600 1.0 6.7541e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             600 1.0 1.6520e+02 1.0 3.43e+08 1.0 3.6e+04 2.9e+04 4.8e+04 27100100100 90  27100100100 90   686
PCSetUp              600 1.0 4.4189e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCApply            18301 1.0 1.4579e+01 1.0 1.62e+08 1.0 0.0e+00 0.0e+00 1.0e+00  2  4  0  0  0   2  4  0  0  0   322
MatMult            16235 1.0 9.3444e+01 1.0 2.86e+08 1.0 3.2e+04 2.9e+04 0.0e+00 15 47 91 91  0  15 47 91 91  0   570
MatMultTranspose    1566 1.0 8.8825e+00 1.0 3.12e+08 1.0 3.1e+03 2.9e+04 0.0e+00  1  5  9  9  0   1  5  9  9  0   624
MatAssemblyBegin     600 1.0 6.0139e-0125.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03  0  0  0  0  2   0  0  0  0  2     0
MatAssemblyEnd       600 1.0 2.5127e+00 1.0 0.00e+00 0.0 4.0e+00 1.5e+04 6.1e+02  0  0  0  0  1   0  0  0  0  1     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

           Index Set     4              4      30272     0
                 Vec   913            902  926180816     0
         Vec Scatter     2              0          0     0
       Krylov Solver     1              0          0     0
      Preconditioner     1              0          0     0
              Matrix     6              0          0     0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 8.10623e-07
Average time for zero size MPI_Send(): 2.0504e-05



-----------------------------------
--- Hypre Euclid preconditioner ---
-----------------------------------

KSP Object:
  type: cg
  maximum iterations=500
  tolerances:  relative=0.05, absolute=1e-50, divergence=10000
  left preconditioning
PC Object:
  type: hypre
    HYPRE Euclid preconditioning
    HYPRE Euclid: number of levels 1
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpiaij, rows=256576, cols=256576
    total: nonzeros=1769552, allocated nonzeros=1769552
      not using I-node (on process 0) routines

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij Wed Jul 15 10:10:05 2009
Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26 CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124

                         Max       Max/Min        Avg      Total 
Time (sec):           6.961e+02      1.00000   6.961e+02
Objects:              1.227e+03      1.00000   1.227e+03
Flops:                1.340e+10      1.00073   1.340e+10  2.679e+10
Flops/sec:            1.925e+07      1.00073   1.924e+07  3.848e+07
MPI Messages:         4.748e+03      1.00000   4.748e+03  9.496e+03
MPI Message Lengths:  1.397e+08      1.00000   2.943e+04  2.794e+08
MPI Reductions:       7.192e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.9614e+02 100.0%  2.6790e+10 100.0%  9.496e+03 100.0%  2.943e+04      100.0%  1.438e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot              5410 1.0 1.1865e+01 4.5 5.26e+08 4.5 0.0e+00 0.0e+00 5.4e+03  1 10  0  0 38   1 10  0  0 38   234
VecNorm             3255 1.0 7.8095e-01 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 3.3e+03  0  6  0  0 23   0  6  0  0 23  2139
VecCopy             1600 1.0 9.5096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              4746 1.0 8.9868e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             6801 1.0 4.8778e+00 1.0 3.59e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 13  0  0  0   1 13  0  0  0   715
VecAYPX             3646 1.0 2.2348e+00 1.0 4.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0   837
VecAssemblyBegin    1200 1.0 2.7152e-01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+03  0  0  0  0 25   0  0  0  0 25     0
VecAssemblyEnd      1200 1.0 1.7414e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    3982 1.0 4.0871e+00 1.0 1.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0   250
VecScatterBegin     4746 1.0 1.8000e-01 1.0 0.00e+00 0.0 9.5e+03 2.9e+04 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       4746 1.0 4.6870e+00 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup             600 1.0 6.8991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             600 1.0 2.5931e+02 1.0 5.17e+07 1.0 9.5e+03 2.9e+04 9.0e+03 37100100100 62  37100100100 62   103
PCSetUp              600 1.0 1.8337e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 26  0  0  0  1  26  0  0  0  1     0
PCApply             5246 1.0 3.6440e+01 1.3 1.88e+07 1.3 0.0e+00 0.0e+00 1.0e+02  5  4  0  0  1   5  4  0  0  1    28
MatMult             3255 1.0 2.3031e+01 1.2 2.85e+08 1.2 6.5e+03 2.9e+04 0.0e+00  3 40 69 69  0   3 40 69 69  0   464
MatMultTranspose    1491 1.0 8.4907e+00 1.0 3.11e+08 1.0 3.0e+03 2.9e+04 0.0e+00  1 20 31 31  0   1 20 31 31  0   621
MatConvert           100 1.0 1.2686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatAssemblyBegin     600 1.0 2.3702e+0042.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03  0  0  0  0  8   0  0  0  0  8     0
MatAssemblyEnd       600 1.0 2.5303e+00 1.0 0.00e+00 0.0 4.0e+00 1.5e+04 6.1e+02  0  0  0  0  4   0  0  0  0  4     0
MatGetRow        12828800 1.0 5.2074e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ          200 1.0 1.6284e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

           Index Set     4              4      30272     0
                 Vec  1213           1202  1234223216     0
         Vec Scatter     2              0          0     0
       Krylov Solver     1              0          0     0
      Preconditioner     1              0          0     0
              Matrix     6              0          0     0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 3.8147e-07
Average time for zero size MPI_Send(): 1.39475e-05




--------------------------------------
--- Hypre BoomerAMG preconditioner ---
--------------------------------------

KSP Object:
  type: cg
  maximum iterations=500
  tolerances:  relative=0.05, absolute=1e-50, divergence=10000
  left preconditioning
PC Object:
  type: hypre
    HYPRE BoomerAMG preconditioning
    HYPRE BoomerAMG: Cycle type V
    HYPRE BoomerAMG: Maximum number of levels 25
    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
    HYPRE BoomerAMG: Threshold for strong coupling 0.25
    HYPRE BoomerAMG: Interpolation truncation factor 0
    HYPRE BoomerAMG: Interpolation: max elements per row 0
    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
    HYPRE BoomerAMG: Maximum row sums 0.9
    HYPRE BoomerAMG: Sweeps down         1
    HYPRE BoomerAMG: Sweeps up           1
    HYPRE BoomerAMG: Sweeps on coarse    1
    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
    HYPRE BoomerAMG: Relax weight  (all)      1
    HYPRE BoomerAMG: Outer relax weight (all) 1
    HYPRE BoomerAMG: Using CF-relaxation
    HYPRE BoomerAMG: Measure type        local
    HYPRE BoomerAMG: Coarsen type        Falgout
    HYPRE BoomerAMG: Interpolation type  classical
  linear system matrix = precond matrix:
  Matrix Object:
    type=mpiaij, rows=256576, cols=256576
    total: nonzeros=1769552, allocated nonzeros=1769552
      not using I-node (on process 0) routines

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./fresco on a linux_32_ named lin0077 with 2 processors, by cklaij Wed Jul 15 09:53:07 2009
Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26 CDT 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124

                         Max       Max/Min        Avg      Total 
Time (sec):           7.080e+02      1.00000   7.080e+02
Objects:              1.227e+03      1.00000   1.227e+03
Flops:                1.054e+10      1.00076   1.054e+10  2.107e+10
Flops/sec:            1.489e+07      1.00076   1.488e+07  2.977e+07
MPI Messages:         3.857e+03      1.00000   3.857e+03  7.714e+03
MPI Message Lengths:  1.135e+08      1.00000   2.942e+04  2.270e+08
MPI Reductions:       5.800e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 7.0799e+02 100.0%  2.1075e+10 100.0%  7.714e+03 100.0%  2.942e+04      100.0%  1.160e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot              3554 1.0 1.8220e+00 1.0 5.03e+08 1.0 0.0e+00 0.0e+00 3.6e+03  0  9  0  0 31   0  9  0  0 31  1001
VecNorm             2327 1.0 6.7031e-01 1.0 9.34e+08 1.0 0.0e+00 0.0e+00 2.3e+03  0  6  0  0 20   0  6  0  0 20  1781
VecCopy             1600 1.0 9.4440e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              3855 1.0 8.0550e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             4982 1.0 3.7953e+00 1.0 3.39e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 12  0  0  0   1 12  0  0  0   674
VecAYPX             2755 1.0 1.8270e+00 1.0 3.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0   774
VecAssemblyBegin    1200 1.0 1.8679e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+03  0  0  0  0 31   0  0  0  0 31     0
VecAssemblyEnd      1200 1.0 1.7717e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    4056 1.0 4.1344e+00 1.0 1.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0   252
VecScatterBegin     3855 1.0 1.5116e-01 1.0 0.00e+00 0.0 7.7e+03 2.9e+04 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       3855 1.0 7.3828e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup             600 1.0 5.1192e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             600 1.0 2.7194e+02 1.0 3.88e+07 1.0 7.7e+03 2.9e+04 6.2e+03 38100100100 53  38100100100 53    77
PCSetUp              600 1.0 1.6630e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 23  0  0  0  2  23  0  0  0  2     0
PCApply             4355 1.0 7.3735e+01 1.0 7.06e+06 1.0 0.0e+00 0.0e+00 1.0e+02 10  5  0  0  1  10  5  0  0  1    14
MatMult             2327 1.0 1.3706e+01 1.0 2.79e+08 1.0 4.7e+03 2.9e+04 0.0e+00  2 36 60 60  0   2 36 60 60  0   557
MatMultTranspose    1528 1.0 8.6412e+00 1.0 3.13e+08 1.0 3.1e+03 2.9e+04 0.0e+00  1 26 40 40  0   1 26 40 40  0   626
MatConvert           100 1.0 1.2962e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatAssemblyBegin     600 1.0 2.4579e+0096.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+03  0  0  0  0 10   0  0  0  0 10     0
MatAssemblyEnd       600 1.0 2.5257e+00 1.0 0.00e+00 0.0 4.0e+00 1.5e+04 6.1e+02  0  0  0  0  5   0  0  0  0  5     0
MatGetRow        12828800 1.0 5.2907e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ          200 1.0 1.7476e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

           Index Set     4              4      30272     0
                 Vec  1213           1202  1234223216     0
         Vec Scatter     2              0          0     0
       Krylov Solver     1              0          0     0
      Preconditioner     1              0          0     0
              Matrix     6              0          0     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 8.10623e-07
Average time for zero size MPI_Send(): 1.95503e-05
OptionTable: -log_summary




-----Original Message-----
Date: Tue, 14 Jul 2009 10:42:58 -0500
From: Barry Smith <bsmith at mcs.anl.gov>
Subject: Re: hypre preconditioners
To: PETSc users list <petsc-users at mcs.anl.gov>
Message-ID: <DC1E3E8F-1D2D-4256-A1EE-14BA81EAEC67 at mcs.anl.gov>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes


    First run the three cases with -log_summary (also -ksp_view to see  
exact solver options that are being used) and send those files. This  
will tell us where the time is being spent; without this information  
any comments are pure speculation. (For example, the "copy" time to  
hypre format is trivial compared to the time to build a hypre  
preconditioner and not the problem).


    What you report is not uncommon; the setup and per iteration cost  
of the hypre preconditioners will be much larger than the simpler  
Jacobi preconditioner.

    Barry

On Jul 14, 2009, at 3:36 AM, Klaij, Christiaan wrote:

>
> I'm solving the steady incompressible Navier-Stokes equations  
> (discretized with FV on unstructured grids) using the SIMPLE  
> Pressure Correction method. I'm using Picard linearization and solve  
> the system for the momentum equations with BICG and for the pressure  
> equation with CG. Currently, for parallel runs, I'm using JACOBI as  
> a preconditioner. My grids typically have a few million cells and I  
> use between 4 and 16 cores (1 to 4 quadcore CPUs on a linux  
> cluster). A significant portion of the CPU time goes into solving  
> the pressure equation. To reach the relative tolerance I need, CG  
> with JACOBI takes about 100 iterations per outer loop for these  
> problems.
>
> In order to reduce CPU time, I've compiled PETSc with support for  
> Hypre and I'm looking at BoomerAMG and Euclid to replace JACOBI as a  
> preconditioner for the pressure equation. With default settings,  
> both BoomerAMG and Euclid greatly reduce the number of iterations:  
> with BoomerAMG 1 or 2 iterations are enough, with Euclid about 10.  
> However, I do not get any reduction in CPU time. With Euclid, CPU  
> time is similar to JACOBI and with BoomerAMG it is approximately  
> doubled.
>
> Is this what one can expect? Are BoomerAMG and Euclid meant for much  
> larger problems? I understand Hypre uses a different matrix storage  
> format, is CPU time 'lost in translation' between PETSc and Hypre  
> for these small problems? Are there maybe any settings I should  
> change?
>
> Chris
>
>
>
>
>
>
>
>
> <mime-attachment.jpeg><mime-attachment.jpeg>
> dr. ir. Christiaan Klaij
> CFD Researcher
> Research & Development
> MARIN
> 2, Haagsteeg
> c.klaij at marin.nl
> P.O. Box 28
> T +31 317 49 39 11
> 6700 AA  Wageningen
> F +31 317 49 32 45
> T  +31 317 49 33 44
> The Netherlands
> I  www.marin.nl
>
>
> MARIN webnews: First AMT'09 conference, Nantes, France, September 1-2
>
>
> This e-mail may be confidential, privileged and/or protected by  
> copyright. If you are not the intended recipient, you should return  
> it to the sender immediately and delete your copy from your system.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 14202 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090715/a0229c8c/attachment-0001.bin>


More information about the petsc-users mailing list