[petsc-users] Question about memory usage in Multigrid preconditioner

Mon Jul 11 12:14:12 CDT 2016

Hi Dave,

I re-run the test using bjacobi as the preconditioner on the coarse mesh 
of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The 
petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred before the 
linear solver finished one step. So I don't have the full info from 
ksp_view. The info from ksp_view_pre is attached.
It seems to me that the error occurred when the decomposition was going 
to be changed.
I had another test with a grid of 1536*128*384 and the same process mesh 
as above. There was no error. The ksp_view info is attached for comparison.
Thank you.

Frank

On 07/08/2016 10:38 PM, Dave May wrote:
>
>
> On Saturday, 9 July 2016, frank <hengjiew at uci.edu 
> <javascript:_e(%7B%7D,'cvml','hengjiew at uci.edu');>> wrote:
>
>     Hi Barry and Dave,
>
>     Thank both of you for the advice.
>
>     @Barry
>     I made a mistake in the file names in last email. I attached the
>     correct files this time.
>     For all the three tests, 'Telescope' is used as the coarse
>     preconditioner.
>
>     == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
>     Part of the memory usage:  Vector   125            124 3971904    0.
>                                                  Matrix   101 101  
>     9462372     0
>
>     == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
>     Part of the memory usage:  Vector   125            124 681672  0.
>                                                  Matrix   101 101  
>     1462180     0.
>
>     In theory, the memory usage in Test1 should be 8 times of Test2.
>     In my case, it is about 6 times.
>
>     == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain
>     per process: 32*32*32
>     Here I get the out of memory error.
>
>     I tried to use -mg_coarse jacobi. In this way, I don't need to set
>     -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
>     The linear solver didn't work in this case. Petsc output some errors.
>
>     @Dave
>     In test3, I use only one instance of 'Telescope'. On the coarse
>     mesh of 'Telescope', I used LU as the preconditioner instead of SVD.
>     If my set the levels correctly, then on the last coarse mesh of MG
>     where it calls 'Telescope', the sub-domain per process is 2*2*2.
>     On the last coarse mesh of 'Telescope', there is only one grid
>     point per process.
>     I still got the OOM error. The detailed petsc option file is attached.
>
>
> Do you understand the expected memory usage for the 
> particular parallel LU implementation you are using? I don't 
> (seriously). Replace LU with bjacobi and re-run this test. My point 
> about solver debugging is still valid.
>
> And please send the result of KSPView so we can see what is actually 
> used in the computations
>
> Thanks
>   Dave
>
>
>
>     Thank you so much.
>
>     Frank
>
>
>
>     On 07/06/2016 02:51 PM, Barry Smith wrote:
>
>             On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu> wrote:
>
>             Hi Barry,
>
>             Thank you for you advice.
>             I tried three test. In the 1st test, the grid is
>             3072*256*768 and the process mesh is 96*8*24.
>             The linear solver is 'cg' the preconditioner is 'mg' and
>             'telescope' is used as the preconditioner at the coarse mesh.
>             The system gives me the "Out of Memory" error before the
>             linear system is completely solved.
>             The info from '-ksp_view_pre' is attached. I seems to me
>             that the error occurs when it reaches the coarse mesh.
>
>             The 2nd test uses a grid of 1536*128*384 and process mesh
>             is 96*8*24. The 3rd test uses the same grid but a
>             different process mesh 48*4*12.
>
>             Are you sure this is right? The total matrix and vector
>         memory usage goes from 2nd test
>                        Vector   384            383      8,193,712  0.
>                        Matrix   103            103     11,508,688  0.
>         to 3rd test
>                       Vector   384            383      1,590,520  0.
>                        Matrix   103            103      3,508,664  0.
>         that is the memory usage got smaller but if you have only
>         1/8th the processes and the same grid it should have gotten
>         about 8 times bigger. Did you maybe cut the grid by a factor
>         of 8 also? If so that still doesn't explain it because the
>         memory usage changed by a factor of 5 something for the
>         vectors and 3 something for the matrices.
>
>
>             The linear solver and petsc options in 2nd and 3rd tests
>             are the same in 1st test. The linear solver works fine in
>             both test.
>             I attached the memory usage of the 2nd and 3rd tests. The
>             memory info is from the option '-log_summary'. I tried to
>             use '-momery_info' as you suggested, but in my case petsc
>             treated it as an unused option. It output nothing about
>             the memory. Do I need to add sth to my code so I can use
>             '-memory_info'?
>
>             Sorry, my mistake the option is -memory_view
>
>            Can you run the one case with -memory_view and -mg_coarse
>         jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to
>         see how much memory is used without the telescope? Also run
>         case 2 the same way.
>
>            Barry
>
>
>
>             In both tests the memory usage is not large.
>
>             It seems to me that it might be the 'telescope'
>             preconditioner that allocated a lot of memory and caused
>             the error in the 1st test.
>             Is there is a way to show how much memory it allocated?
>
>             Frank
>
>             On 07/05/2016 03:37 PM, Barry Smith wrote:
>
>                    Frank,
>
>                      You can run with -ksp_view_pre to have it "view"
>                 the KSP before the solve so hopefully it gets that far.
>
>                       Please run the problem that does fit with
>                 -memory_info when the problem completes it will show
>                 the "high water mark" for PETSc allocated memory and
>                 total memory used. We first want to look at these
>                 numbers to see if it is using more memory than you
>                 expect. You could also run with say half the grid
>                 spacing to see how the memory usage scaled with the
>                 increase in grid points. Make the runs also with
>                 -log_view and send all the output from these options.
>
>                     Barry
>
>                     On Jul 5, 2016, at 5:23 PM, frank
>                     <hengjiew at uci.edu> wrote:
>
>                     Hi,
>
>                     I am using the CG ksp solver and Multigrid
>                     preconditioner  to solve a linear system in parallel.
>                     I chose to use the 'Telescope' as the
>                     preconditioner on the coarse mesh for its good
>                     performance.
>                     The petsc options file is attached.
>
>                     The domain is a 3d box.
>                     It works well when the grid is  1536*128*384 and
>                     the process mesh is 96*8*24. When I double the
>                     size of grid and keep the same process mesh and
>                     petsc options, I get an "out of memory" error from
>                     the super-cluster I am using.
>                     Each process has access to at least 8G memory,
>                     which should be more than enough for my
>                     application. I am sure that all the other parts of
>                     my code( except the linear solver ) do not use
>                     much memory. So I doubt if there is something
>                     wrong with the linear solver.
>                     The error occurs before the linear system is
>                     completely solved so I don't have the info from
>                     ksp view. I am not able to re-produce the error
>                     with a smaller problem either.
>                     In addition,  I tried to use the block jacobi as
>                     the preconditioner with the same grid and same
>                     decomposition. The linear solver runs extremely
>                     slow but there is no memory error.
>
>                     How can I diagnose what exactly cause the error?
>                     Thank you so much.
>
>                     Frank
>                     <petsc_options.txt>
>
>             <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160711/5fee2a2b/attachment-0001.html>
-------------- next part --------------
KSP Object: 18432 MPI processes
  type: cg
  maximum iterations=1
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
  type: mg
  PC has not been set up so information may be incomplete
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     18432 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using DEFAULT norm type for convergence test
    PC Object:    (mg_coarse_)     18432 MPI processes
      type: redundant
      PC has not been set up so information may be incomplete
        Redundant preconditioner: Not yet setup
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     18432 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     18432 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   18432 MPI processes
    type: mpiaij
    rows=603979776, cols=603979776
    total: nonzeros=4223139840, allocated nonzeros=4223139840
    total number of mallocs used during MatSetValues calls =0
      has attached null space
[NID 00631] 2016-07-10 06:22:58 Apid 45768056: initiated application termination
[NID 06277] 2016-07-10 06:23:00 Apid 45768056: OOM killer terminated this process.
[NID 06235] 2016-07-10 06:23:00 Apid 45768056: OOM killer terminated this process.
-------------- next part --------------
-ksp_type        cg 
-ksp_norm_type   unpreconditioned
-ksp_lag_norm
-ksp_rtol        1e-7
-ksp_initial_guess_nonzero  yes
-ksp_converged_reason 
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 5
-mg_levels_ksp_type richardson 
-mg_levels_ksp_max_it 1
-ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left 1
-log_view
-memory_view
-ksp_view_pre

# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 24
-mg_coarse_telescope_repart_da_processors_y 2
-mg_coarse_telescope_repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type bjacobi
-------------- next part --------------
KSP Object: 18432 MPI processes
  type: cg
  maximum iterations=1
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=4 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     18432 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     18432 MPI processes
      type: telescope
        Telescope: parent comm size reduction factor = 64
        Telescope: comm_size = 18432 , subcomm_size = 288
          Telescope: DMDA detected
        DMDA Object:    (mg_coarse_telescope_repart_)    288 MPI processes
          M 192 N 16 P 48 m 24 n 2 p 6 dof 1 overlap 1
        KSP Object:        (mg_coarse_telescope_)         288 MPI processes
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_telescope_)         288 MPI processes
          type: mg
            MG: type is MULTIPLICATIVE, levels=4 cycles=v
              Cycles per PCApply=1
              Using Galerkin computed coarse grid matrices
          Coarse grid solver -- level -------------------------------
            KSP Object:            (mg_coarse_telescope_mg_coarse_)             288 MPI processes
              type: preonly
              maximum iterations=10000, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
              left preconditioning
              using NONE norm type for convergence test
            PC Object:            (mg_coarse_telescope_mg_coarse_)             288 MPI processes
              type: lu
                LU: out-of-place factorization
                tolerance for zero pivot 2.22045e-14
                matrix ordering: natural
                factor fill ratio given 0., needed 0.
                  Factored matrix follows:
                    Mat Object:                     288 MPI processes
                      type: mpiaij
                      rows=288, cols=288
                      package used to perform factorization: superlu_dist
                      total: nonzeros=0, allocated nonzeros=0
                      total number of mallocs used during MatSetValues calls =0
                        SuperLU_DIST run parameters:
                          Process grid nprow 18 x npcol 16 
                          Equilibrate matrix TRUE 
                          Matrix input mode 1 
                          Replace tiny pivots FALSE 
                          Use iterative refinement FALSE 
                          Processors in row 18 col partition 16 
                          Row permutation LargeDiag 
                          Column permutation METIS_AT_PLUS_A
                          Parallel symbolic factorization FALSE 
                          Repeated factorization SamePattern
              linear system matrix = precond matrix:
              Mat Object:               288 MPI processes
                type: mpiaij
                rows=288, cols=288
                total: nonzeros=1728, allocated nonzeros=1728
                total number of mallocs used during MatSetValues calls =0
                  not using I-node (on process 0) routines
          Down solver (pre-smoother) on level 1 -------------------------------
            KSP Object:            (mg_coarse_telescope_mg_levels_1_)             288 MPI processes
              type: richardson
                Richardson: damping factor=1.
              maximum iterations=1
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
              left preconditioning
              using nonzero initial guess
              using NONE norm type for convergence test
            PC Object:            (mg_coarse_telescope_mg_levels_1_)             288 MPI processes
              type: sor
                SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
              linear system matrix = precond matrix:
              Mat Object:               288 MPI processes
                type: mpiaij
                rows=2304, cols=2304
                total: nonzeros=14976, allocated nonzeros=14976
                total number of mallocs used during MatSetValues calls =0
                  not using I-node (on process 0) routines
          Up solver (post-smoother) same as down solver (pre-smoother)
          Down solver (pre-smoother) on level 2 -------------------------------
            KSP Object:            (mg_coarse_telescope_mg_levels_2_)             288 MPI processes
              type: richardson
                Richardson: damping factor=1.
              maximum iterations=1
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
              left preconditioning
              using nonzero initial guess
              using NONE norm type for convergence test
            PC Object:            (mg_coarse_telescope_mg_levels_2_)             288 MPI processes
              type: sor
                SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
              linear system matrix = precond matrix:
              Mat Object:               288 MPI processes
                type: mpiaij
                rows=18432, cols=18432
                total: nonzeros=124416, allocated nonzeros=124416
                total number of mallocs used during MatSetValues calls =0
                  not using I-node (on process 0) routines
          Up solver (post-smoother) same as down solver (pre-smoother)
          Down solver (pre-smoother) on level 3 -------------------------------
            KSP Object:            (mg_coarse_telescope_mg_levels_3_)             288 MPI processes
              type: richardson
                Richardson: damping factor=1.
              maximum iterations=1
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
              left preconditioning
              using nonzero initial guess
              using NONE norm type for convergence test
            PC Object:            (mg_coarse_telescope_mg_levels_3_)             288 MPI processes
              type: sor
                SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
              linear system matrix = precond matrix:
              Mat Object:               288 MPI processes
                type: mpiaij
                rows=147456, cols=147456
                total: nonzeros=1013760, allocated nonzeros=1013760
                total number of mallocs used during MatSetValues calls =0
                  not using I-node (on process 0) routines
          Up solver (post-smoother) same as down solver (pre-smoother)
          linear system matrix = precond matrix:
          Mat Object:           288 MPI processes
            type: mpiaij
            rows=147456, cols=147456
            total: nonzeros=1013760, allocated nonzeros=1013760
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      linear system matrix = precond matrix:
      Mat Object:       18432 MPI processes
        type: mpiaij
        rows=147456, cols=147456
        total: nonzeros=1013760, allocated nonzeros=1013760
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     18432 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     18432 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       18432 MPI processes
        type: mpiaij
        rows=1179648, cols=1179648
        total: nonzeros=8183808, allocated nonzeros=8183808
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     18432 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     18432 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       18432 MPI processes
        type: mpiaij
        rows=9437184, cols=9437184
        total: nonzeros=65765376, allocated nonzeros=65765376
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     18432 MPI processes
      type: richardson
        Richardson: damping factor=1.
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     18432 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       18432 MPI processes
        type: mpiaij
        rows=75497472, cols=75497472
        total: nonzeros=527302656, allocated nonzeros=527302656
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   18432 MPI processes
    type: mpiaij
    rows=75497472, cols=75497472
    total: nonzeros=527302656, allocated nonzeros=527302656
    total number of mallocs used during MatSetValues calls =0
      has attached null space