[petsc-users] Question about memory usage in Multigrid preconditioner

Wed Sep 14 17:24:02 CDT 2016

Hi,

I write a simple code to re-produce the error. I hope this can help to 
diagnose the problem.
The code just solves a 3d poisson equation.
I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. 
That's when I re-produce the OOM error. Each core has about 2G memory.
I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp 
solver works fine.
I attached the code, ksp_view_pre's output and my petsc option file.

Thank you.
Frank

On 09/09/2016 06:38 PM, Hengjie Wang wrote:
> Hi Barry,
>
> I checked. On the supercomputer, I had the option "-ksp_view_pre" but 
> it is not in file I sent you. I am sorry for the confusion.
>
> Regards,
> Frank
>
> On Friday, September 9, 2016, Barry Smith <bsmith at mcs.anl.gov 
> <mailto:bsmith at mcs.anl.gov>> wrote:
>
>
>     > On Sep 9, 2016, at 3:11 PM, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >
>     > Hi Barry,
>     >
>     > I think the first KSP view output is from -ksp_view_pre. Before
>     I submitted the test, I was not sure whether there would be OOM
>     error or not. So I added both -ksp_view_pre and -ksp_view.
>
>       But the options file you sent specifically does NOT list the
>     -ksp_view_pre so how could it be from that?
>
>        Sorry to be pedantic but I've spent too much time in the past
>     trying to debug from incorrect information and want to make sure
>     that the information I have is correct before thinking. Please
>     recheck exactly what happened. Rerun with the exact input file you
>     emailed if that is needed.
>
>        Barry
>
>     >
>     > Frank
>     >
>     >
>     > On 09/09/2016 12:38 PM, Barry Smith wrote:
>     >>   Why does ksp_view2.txt have two KSP views in it while
>     ksp_view1.txt has only one KSPView in it? Did you run two
>     different solves in the 2 case but not the one?
>     >>
>     >>   Barry
>     >>
>     >>
>     >>
>     >>> On Sep 9, 2016, at 10:56 AM, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >>>
>     >>> Hi,
>     >>>
>     >>> I want to continue digging into the memory problem here.
>     >>> I did find a work around in the past, which is to use less
>     cores per node so that each core has 8G memory. However this is
>     deficient and expensive. I hope to locate the place that uses the
>     most memory.
>     >>>
>     >>> Here is a brief summary of the tests I did in past:
>     >>>> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>     >>> Maximum (over computational time) process memory:         
>      total 7.0727e+08
>     >>> Current process memory:                                total
>     7.0727e+08
>     >>> Maximum (over computational time) space PetscMalloc()ed: 
>     total 6.3908e+11
>     >>> Current space PetscMalloc()ed:                            
>     total 1.8275e+09
>     >>>
>     >>>> Test2:    Mesh 1536*128*384  |  Process Mesh 96*8*24
>     >>> Maximum (over computational time) process memory:         
>      total 5.9431e+09
>     >>> Current process memory:                                total
>     5.9431e+09
>     >>> Maximum (over computational time) space PetscMalloc()ed: 
>     total 5.3202e+12
>     >>> Current space PetscMalloc()ed:                            
>      total 5.4844e+09
>     >>>
>     >>>> Test3:    Mesh 3072*256*768  |  Process Mesh 96*8*24
>     >>>     OOM( Out Of Memory ) killer of the supercomputer
>     terminated the job during "KSPSolve".
>     >>>
>     >>> I attached the output of ksp_view( the third test's output is
>     from ksp_view_pre ), memory_view and also the petsc options.
>     >>>
>     >>> In all the tests, each core can access about 2G memory. In
>     test3, there are 4223139840 non-zeros in the matrix. This will
>     consume about 1.74M, using double precision. Considering some
>     extra memory used to store integer index, 2G memory should still
>     be way enough.
>     >>>
>     >>> Is there a way to find out which part of KSPSolve uses the
>     most memory?
>     >>> Thank you so much.
>     >>>
>     >>> BTW, there are 4 options remains unused and I don't understand
>     why they are omitted:
>     >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>     >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>     >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>     >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>     >>>
>     >>>
>     >>> Regards,
>     >>> Frank
>     >>>
>     >>> On 07/13/2016 05:47 PM, Dave May wrote:
>     >>>>
>     >>>> On 14 July 2016 at 01:07, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >>>> Hi Dave,
>     >>>>
>     >>>> Sorry for the late reply.
>     >>>> Thank you so much for your detailed reply.
>     >>>>
>     >>>> I have a question about the estimation of the memory usage.
>     There are 4223139840 allocated non-zeros and 18432 MPI processes.
>     Double precision is used. So the memory per process is:
>     >>>>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>     >>>> Did I do sth wrong here? Because this seems too small.
>     >>>>
>     >>>> No - I totally f***ed it up. You are correct. That'll teach
>     me for fumbling around with my iphone calculator and not using my
>     brain. (Note that to convert to MB just divide by 1e6, not 1024^2
>     - although I apparently cannot convert between units correctly....)
>     >>>>
>     >>>> From the PETSc objects associated with the solver, It looks
>     like it _should_ run with 2GB per MPI rank. Sorry for my mistake.
>     Possibilities are: somewhere in your usage of PETSc you've
>     introduced a memory leak; PETSc is doing a huge over allocation
>     (e.g. as per our discussion of MatPtAP); or in your application
>     code there are other objects you have forgotten to log the memory for.
>     >>>>
>     >>>>
>     >>>>
>     >>>> I am running this job on Bluewater
>     >>>> I am using the 7 points FD stencil in 3D.
>     >>>>
>     >>>> I thought so on both counts.
>     >>>>
>     >>>> I apologize that I made a stupid mistake in computing the
>     memory per core. My settings render each core can access only 2G
>     memory on average instead of 8G which I mentioned in previous
>     email. I re-run the job with 8G memory per core on average and
>     there is no "Out Of Memory" error. I would do more test to see if
>     there is still some memory issue.
>     >>>>
>     >>>> Ok. I'd still like to know where the memory was being used
>     since my estimates were off.
>     >>>>
>     >>>>
>     >>>> Thanks,
>     >>>>   Dave
>     >>>>
>     >>>> Regards,
>     >>>> Frank
>     >>>>
>     >>>>
>     >>>>
>     >>>> On 07/11/2016 01:18 PM, Dave May wrote:
>     >>>>> Hi Frank,
>     >>>>>
>     >>>>>
>     >>>>> On 11 July 2016 at 19:14, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >>>>> Hi Dave,
>     >>>>>
>     >>>>> I re-run the test using bjacobi as the preconditioner on the
>     coarse mesh of telescope. The Grid is 3072*256*768 and process
>     mesh is 96*8*24. The petsc option file is attached.
>     >>>>> I still got the "Out Of Memory" error. The error occurred
>     before the linear solver finished one step. So I don't have the
>     full info from ksp_view. The info from ksp_view_pre is attached.
>     >>>>>
>     >>>>> Okay - that is essentially useless (sorry)
>     >>>>>
>     >>>>> It seems to me that the error occurred when the
>     decomposition was going to be changed.
>     >>>>>
>     >>>>> Based on what information?
>     >>>>> Running with -info would give us more clues, but will create
>     a ton of output.
>     >>>>> Please try running the case which failed with -info
>     >>>>>  I had another test with a grid of 1536*128*384 and the same
>     process mesh as above. There was no error. The ksp_view info is
>     attached for comparison.
>     >>>>> Thank you.
>     >>>>>
>     >>>>>
>     >>>>> [3] Here is my crude estimate of your memory usage.
>     >>>>> I'll target the biggest memory hogs only to get an order of
>     magnitude estimate
>     >>>>>
>     >>>>> * The Fine grid operator contains 4223139840 non-zeros -->
>     1.8 GB per MPI rank assuming double precision.
>     >>>>> The indices for the AIJ could amount to another 0.3 GB
>     (assuming 32 bit integers)
>     >>>>>
>     >>>>> * You use 5 levels of coarsening, so the other operators
>     should represent (collectively)
>     >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank
>     on the communicator with 18432 ranks.
>     >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the
>     communicator with 18432 ranks.
>     >>>>>
>     >>>>> * You use a reduction factor of 64, making the new
>     communicator with 288 MPI ranks.
>     >>>>> PCTelescope will first gather a temporary matrix associated
>     with your coarse level operator assuming a comm size of 288 living
>     on the comm with size 18432.
>     >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per
>     core on the 288 ranks.
>     >>>>> This matrix is then used to form a new MPIAIJ matrix on the
>     subcomm, thus require another 32 MB per rank.
>     >>>>> The temporary matrix is now destroyed.
>     >>>>>
>     >>>>> * Because a DMDA is detected, a permutation matrix is assembled.
>     >>>>> This requires 2 doubles per point in the DMDA.
>     >>>>> Your coarse DMDA contains 92 x 16 x 48 points.
>     >>>>> Thus the permutation matrix will require < 1 MB per MPI rank
>     on the sub-comm.
>     >>>>>
>     >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but
>     the resulting operator will have the same memory footprint as the
>     unpermuted matrix (32 MB). At any stage in PCTelescope, only 2
>     operators of size 32 MB are held in memory when the DMDA is provided.
>     >>>>>
>     >>>>> From my rough estimates, the worst case memory foot print
>     for any given core, given your options is approximately
>     >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB
>     >>>>> This is way below 8 GB.
>     >>>>>
>     >>>>> Note this estimate completely ignores:
>     >>>>> (1) the memory required for the restriction operator,
>     >>>>> (2) the potential growth in the number of non-zeros per row
>     due to Galerkin coarsening (I wished -ksp_view_pre reported the
>     output from MatView so we could see the number of non-zeros
>     required by the coarse level operators)
>     >>>>> (3) all temporary vectors required by the CG solver, and
>     those required by the smoothers.
>     >>>>> (4) internal memory allocated by MatPtAP
>     >>>>> (5) memory associated with IS's used within PCTelescope
>     >>>>>
>     >>>>> So either I am completely off in my estimates, or you have
>     not carefully estimated the memory usage of your application code.
>     Hopefully others might examine/correct my rough estimates
>     >>>>>
>     >>>>> Since I don't have your code I cannot access the latter.
>     >>>>> Since I don't have access to the same machine you are
>     running on, I think we need to take a step back.
>     >>>>>
>     >>>>> [1] What machine are you running on? Send me a URL if its
>     available
>     >>>>>
>     >>>>> [2] What discretization are you using? (I am guessing a
>     scalar 7 point FD stencil)
>     >>>>> If it's a 7 point FD stencil, we should be able to examine
>     the memory usage of your solver configuration using a standard,
>     light weight existing PETSc example, run on your machine at the
>     same scale.
>     >>>>> This would hopefully enable us to correctly evaluate the
>     actual memory usage required by the solver configuration you are
>     using.
>     >>>>>
>     >>>>> Thanks,
>     >>>>>   Dave
>     >>>>>
>     >>>>>
>     >>>>> Frank
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> On 07/08/2016 10:38 PM, Dave May wrote:
>     >>>>>>
>     >>>>>> On Saturday, 9 July 2016, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >>>>>> Hi Barry and Dave,
>     >>>>>>
>     >>>>>> Thank both of you for the advice.
>     >>>>>>
>     >>>>>> @Barry
>     >>>>>> I made a mistake in the file names in last email. I
>     attached the correct files this time.
>     >>>>>> For all the three tests, 'Telescope' is used as the coarse
>     preconditioner.
>     >>>>>>
>     >>>>>> == Test1:   Grid: 1536*128*384,  Process Mesh: 48*4*12
>     >>>>>> Part of the memory usage:  Vector  125            124
>     3971904     0.
>     >>>>>>         Matrix   101 101      9462372     0
>     >>>>>>
>     >>>>>> == Test2: Grid: 1536*128*384,  Process Mesh: 96*8*24
>     >>>>>> Part of the memory usage:  Vector  125            124
>     681672     0.
>     >>>>>>         Matrix   101 101      1462180     0.
>     >>>>>>
>     >>>>>> In theory, the memory usage in Test1 should be 8 times of
>     Test2. In my case, it is about 6 times.
>     >>>>>>
>     >>>>>> == Test3: Grid: 3072*256*768,  Process Mesh: 96*8*24.
>     Sub-domain per process: 32*32*32
>     >>>>>> Here I get the out of memory error.
>     >>>>>>
>     >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need
>     to set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
>     >>>>>> The linear solver didn't work in this case. Petsc output
>     some errors.
>     >>>>>>
>     >>>>>> @Dave
>     >>>>>> In test3, I use only one instance of 'Telescope'. On the
>     coarse mesh of 'Telescope', I used LU as the preconditioner
>     instead of SVD.
>     >>>>>> If my set the levels correctly, then on the last coarse
>     mesh of MG where it calls 'Telescope', the sub-domain per process
>     is 2*2*2.
>     >>>>>> On the last coarse mesh of 'Telescope', there is only one
>     grid point per process.
>     >>>>>> I still got the OOM error. The detailed petsc option file
>     is attached.
>     >>>>>>
>     >>>>>> Do you understand the expected memory usage for the
>     particular parallel LU implementation you are using? I don't
>     (seriously). Replace LU with bjacobi and re-run this test. My
>     point about solver debugging is still valid.
>     >>>>>>
>     >>>>>> And please send the result of KSPView so we can see what is
>     actually used in the computations
>     >>>>>>
>     >>>>>> Thanks
>     >>>>>>   Dave
>     >>>>>>
>     >>>>>>
>     >>>>>> Thank you so much.
>     >>>>>>
>     >>>>>> Frank
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
>     >>>>>> On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >>>>>>
>     >>>>>> Hi Barry,
>     >>>>>>
>     >>>>>> Thank you for you advice.
>     >>>>>> I tried three test. In the 1st test, the grid is
>     3072*256*768 and the process mesh is 96*8*24.
>     >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and
>     'telescope' is used as the preconditioner at the coarse mesh.
>     >>>>>> The system gives me the "Out of Memory" error before the
>     linear system is completely solved.
>     >>>>>> The info from '-ksp_view_pre' is attached. I seems to me
>     that the error occurs when it reaches the coarse mesh.
>     >>>>>>
>     >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh
>     is 96*8*24. The 3rd                                test uses the
>     same grid but a different process mesh 48*4*12.
>     >>>>>>     Are you sure this is right? The total matrix and vector
>     memory usage goes from 2nd test
>     >>>>>>                Vector   384   383      8,193,712     0.
>     >>>>>>                Matrix   103   103     11,508,688     0.
>     >>>>>> to 3rd test
>     >>>>>>               Vector   384 383      1,590,520     0.
>     >>>>>>                Matrix   103   103      3,508,664     0.
>     >>>>>> that is the memory usage got smaller but if you have only
>     1/8th the processes and the same grid it should have gotten about
>     8 times bigger. Did you maybe cut the grid by a factor of 8 also?
>     If so that still doesn't explain it because the memory usage
>     changed by a factor of 5 something for the vectors and 3 something
>     for the matrices.
>     >>>>>>
>     >>>>>>
>     >>>>>> The linear solver and petsc options in 2nd and 3rd tests
>     are the same in 1st test. The linear solver works fine in both test.
>     >>>>>> I attached the memory usage of the 2nd and 3rd tests. The
>     memory info is from the option '-log_summary'. I tried to use
>     '-momery_info' as you suggested, but in my case petsc treated it
>     as an unused option. It output nothing about the memory. Do I need
>     to add sth to my code so I can use '-memory_info'?
>     >>>>>>     Sorry, my mistake the option is -memory_view
>     >>>>>>
>     >>>>>>    Can you run the one case with -memory_view and
>     -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate
>     forever) to see how much memory is used without the telescope?
>     Also run case 2 the same way.
>     >>>>>>
>     >>>>>>    Barry
>     >>>>>>
>     >>>>>>
>     >>>>>>
>     >>>>>> In both tests the memory usage is not large.
>     >>>>>>
>     >>>>>> It seems to me that it might be the 'telescope' 
>     preconditioner that allocated a lot of memory and caused the error
>     in the 1st test.
>     >>>>>> Is there is a way to show how much memory it allocated?
>     >>>>>>
>     >>>>>> Frank
>     >>>>>>
>     >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>     >>>>>>    Frank,
>     >>>>>>
>     >>>>>>      You can run with -ksp_view_pre to have it "view" the
>     KSP before the solve so hopefully it gets that far.
>     >>>>>>
>     >>>>>>       Please run the problem that does fit with
>     -memory_info when the problem completes it will show the "high
>     water mark" for PETSc allocated memory and total memory used. We
>     first want to look at these numbers to see if it is using more
>     memory than you expect. You could also run with say half the grid
>     spacing to see how the memory usage scaled with the increase in
>     grid points. Make the runs also with -log_view and send all the
>     output from these options.
>     >>>>>>
>     >>>>>>     Barry
>     >>>>>>
>     >>>>>> On Jul 5, 2016, at 5:23 PM, frank <hengjiew at uci.edu
>     <javascript:;>> wrote:
>     >>>>>>
>     >>>>>> Hi,
>     >>>>>>
>     >>>>>> I am using the CG ksp solver and Multigrid preconditioner 
>     to solve a linear system in parallel.
>     >>>>>> I chose to use the 'Telescope' as the preconditioner on the
>     coarse mesh for its good performance.
>     >>>>>> The petsc options file is attached.
>     >>>>>>
>     >>>>>> The domain is a 3d box.
>     >>>>>> It works well when the grid is 1536*128*384 and the process
>     mesh is 96*8*24. When I double the size of grid and        keep
>     the same process mesh and petsc options, I get an "out of memory"
>     error from the super-cluster I am using.
>     >>>>>> Each process has access to at least 8G memory, which should
>     be more than enough for my application. I am sure that all the
>     other parts of my code( except the linear solver ) do not use much
>     memory. So I doubt if there is something wrong with the linear solver.
>     >>>>>> The error occurs before the linear system is completely
>     solved so I don't have the info from ksp view. I am not able to
>     re-produce the error with a smaller problem either.
>     >>>>>> In addition,  I tried to use the block jacobi as the
>     preconditioner with the same grid and same decomposition. The
>     linear solver runs extremely slow but there is no memory error.
>     >>>>>>
>     >>>>>> How can I diagnose what exactly cause the error?
>     >>>>>> Thank you so much.
>     >>>>>>
>     >>>>>> Frank
>     >>>>>> <petsc_options.txt>
>     >>>>>>
>     <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>     >>>>>>
>     >>>>>
>     >>>>
>     >>>
>     <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
>     >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160914/485d77ec/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_ksp.f90
Type: text/x-fortran
Size: 5742 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160914/485d77ec/attachment-0001.bin>
-------------- next part --------------
KSP Object: 32768 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 32768 MPI processes
  type: mg
  PC has not been set up so information may be incomplete
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object: (mg_coarse_) 32768 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using DEFAULT norm type for convergence test
    PC Object: (mg_coarse_) 32768 MPI processes
      type: redundant
      PC has not been set up so information may be incomplete
        Redundant preconditioner: Not yet setup
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object: (mg_levels_1_) 32768 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_1_) 32768 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object: (mg_levels_2_) 32768 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_2_) 32768 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object: (mg_levels_3_) 32768 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_3_) 32768 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object: (mg_levels_4_) 32768 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0., max = 0.
      maximum iterations=2, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_4_) 32768 MPI processes
      type: sor
      PC has not been set up so information may be incomplete
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object: 32768 MPI processes
    type: mpiaij
    rows=1073741824, cols=1073741824
    total: nonzeros=7516192768, allocated nonzeros=7516192768
    total number of mallocs used during MatSetValues calls =0
      has attached null space
-------------- next part --------------
-ksp_type        cg  
-ksp_norm_type   unpreconditioned
-ksp_lag_norm
-ksp_rtol        1e-7
-ksp_initial_guess_nonzero  yes 
-ksp_converged_reason 
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 5
-mg_levels_ksp_type richardson 
-mg_levels_ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left 1
-log_summary
-ksp_view_pre

#setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 8
-mg_coarse_telescope_repart_da_processors_y 8
-mg_coarse_telescope_repart_da_processors_z 8
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 3
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type redundant
-mg_coarse_telescope_mg_coarse_redundant_pc_type bjacobi