[petsc-users] Question about memory usage in Multigrid preconditioner

Wed Jul 6 16:19:15 CDT 2016

Hi Barry,

Thank you for you advice.
I tried three test. In the 1st test, the grid is 3072*256*768 and the 
process mesh is 96*8*24.
The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is 
used as the preconditioner at the coarse mesh.
The system gives me the "Out of Memory" error before the linear system 
is completely solved.
The info from '-ksp_view_pre' is attached. I seems to me that the error 
occurs when it reaches the coarse mesh.

The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. 
The 3rd test uses the same grid but a different process mesh 48*4*12.
The linear solver and petsc options in 2nd and 3rd tests are the same in 
1st test. The linear solver works fine in both test.
I attached the memory usage of the 2nd and 3rd tests. The memory info is 
from the option '-log_summary'. I tried to use '-momery_info' as you 
suggested, but in my case petsc treated it as an unused option. It 
output nothing about the memory. Do I need to add sth to my code so I 
can use '-memory_info'?
In both tests the memory usage is not large.

It seems to me that it might be the 'telescope'  preconditioner that 
allocated a lot of memory and caused the error in the 1st test.
Is there is a way to show how much memory it allocated?

Frank

On 07/05/2016 03:37 PM, Barry Smith wrote:
>    Frank,
>
>      You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far.
>
>       Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options.
>
>     Barry
>
>> On Jul 5, 2016, at 5:23 PM, frank <hengjiew at uci.edu> wrote:
>>
>> Hi,
>>
>> I am using the CG ksp solver and Multigrid preconditioner  to solve a linear system in parallel.
>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance.
>> The petsc options file is attached.
>>
>> The domain is a 3d box.
>> It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using.
>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver.
>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either.
>> In addition,  I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error.
>>
>> How can I diagnose what exactly cause the error?
>> Thank you so much.
>>
>> Frank
>> <petsc_options.txt>

-------------- next part --------------
KSP Object: 18432 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
  type: mg
  PC has not been set up so information may be incomplete
    MG: type is MULTIPLICATIVE, levels=4 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     18432 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using DEFAULT norm type for convergence test
    PC Object:    (mg_coarse_)     18432 MPI processes
      type: redundant
      PC has not been set up so information may be incomplete
        Redundant preconditioner: Not yet setup
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   18432 MPI processes
    type: mpiaij
    rows=603979776, cols=603979776
    total: nonzeros=4223139840, allocated nonzeros=4223139840
    total number of mallocs used during MatSetValues calls =0
      has attached null space
[NID 03157] 2016-07-05 18:53:01 Apid 45102172: initiated application termination
[NID 00773] 2016-07-05 18:53:02 Apid 45102172: OOM killer terminated this process.
[NID 09993] 2016-07-05 18:53:02 Apid 45102172: OOM killer terminated this process.
-------------- next part --------------
Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     5              4         3328     0.
              Vector   384            383      8193712     0.
      Vector Scatter    27             23        61776     0.
              Matrix   103            103     11508688     0.
   Matrix Null Space     1              1          592     0.
    Distributed Mesh     8              4        20288     0.
Star Forest Bipartite Graph    16              8         6784     0.
     Discrete System     8              4         3456     0.
           Index Set    55             55       277240     0.
   IS L to G Mapping     8              4        27136     0.
       Krylov Solver    10             10        12392     0.
     DMKSP interface     6              3         1944     0.
      Preconditioner    10             10         9952     0.
-------------- next part --------------
Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     5              4         3328     0.
              Vector   384            383      1590520     0.
      Vector Scatter    27             23        28568     0.
              Matrix   103            103      3508664     0.
   Matrix Null Space     1              1          592     0.
    Distributed Mesh     8              4        20288     0.
Star Forest Bipartite Graph    16              8         6784     0.
     Discrete System     8              4         3456     0.
           Index Set    55             55        80868     0.
   IS L to G Mapping     8              4         7080     0.
       Krylov Solver    10             10        12392     0.
     DMKSP interface     6              3         1944     0.
      Preconditioner    10             10         9952     0.
-------------- next part --------------
-ksp_type        cg 
-ksp_norm_type   unpreconditioned
-ksp_lag_norm
-ksp_rtol        1e-7
-ksp_initial_guess_nonzero  yes
-ksp_converged_reason 
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 4
-mg_levels_ksp_type richardson 
-mg_levels_ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left
-log_summary

# options for telescope
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type svd