[petsc-users] Question about memory usage in Multigrid preconditioner
frank
hengjiew at uci.edu
Fri Jul 8 20:05:45 CDT 2016
Hi Barry and Dave,
Thank both of you for the advice.
@Barry
I made a mistake in the file names in last email. I attached the correct
files this time.
For all the three tests, 'Telescope' is used as the coarse preconditioner.
== Test1: Grid: 1536*128*384, Process Mesh: 48*4*12
Part of the memory usage: Vector 125 124 3971904 0.
Matrix 101 101
9462372 0
== Test2: Grid: 1536*128*384, Process Mesh: 96*8*24
Part of the memory usage: Vector 125 124 681672 0.
Matrix 101 101
1462180 0.
In theory, the memory usage in Test1 should be 8 times of Test2. In my
case, it is about 6 times.
== Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain per
process: 32*32*32
Here I get the out of memory error.
I tried to use -mg_coarse jacobi. In this way, I don't need to set
-mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
The linear solver didn't work in this case. Petsc output some errors.
@Dave
In test3, I use only one instance of 'Telescope'. On the coarse mesh of
'Telescope', I used LU as the preconditioner instead of SVD.
If my set the levels correctly, then on the last coarse mesh of MG where
it calls 'Telescope', the sub-domain per process is 2*2*2.
On the last coarse mesh of 'Telescope', there is only one grid point per
process.
I still got the OOM error. The detailed petsc option file is attached.
Thank you so much.
Frank
On 07/06/2016 02:51 PM, Barry Smith wrote:
>> On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu> wrote:
>>
>> Hi Barry,
>>
>> Thank you for you advice.
>> I tried three test. In the 1st test, the grid is 3072*256*768 and the process mesh is 96*8*24.
>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as the preconditioner at the coarse mesh.
>> The system gives me the "Out of Memory" error before the linear system is completely solved.
>> The info from '-ksp_view_pre' is attached. I seems to me that the error occurs when it reaches the coarse mesh.
>>
>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd test uses the same grid but a different process mesh 48*4*12.
> Are you sure this is right? The total matrix and vector memory usage goes from 2nd test
> Vector 384 383 8,193,712 0.
> Matrix 103 103 11,508,688 0.
> to 3rd test
> Vector 384 383 1,590,520 0.
> Matrix 103 103 3,508,664 0.
> that is the memory usage got smaller but if you have only 1/8th the processes and the same grid it should have gotten about 8 times bigger. Did you maybe cut the grid by a factor of 8 also? If so that still doesn't explain it because the memory usage changed by a factor of 5 something for the vectors and 3 something for the matrices.
>
>
>> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st test. The linear solver works fine in both test.
>> I attached the memory usage of the 2nd and 3rd tests. The memory info is from the option '-log_summary'. I tried to use '-momery_info' as you suggested, but in my case petsc treated it as an unused option. It output nothing about the memory. Do I need to add sth to my code so I can use '-memory_info'?
> Sorry, my mistake the option is -memory_view
>
> Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory is used without the telescope? Also run case 2 the same way.
>
> Barry
>
>
>
>> In both tests the memory usage is not large.
>>
>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused the error in the 1st test.
>> Is there is a way to show how much memory it allocated?
>>
>> Frank
>>
>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>> Frank,
>>>
>>> You can run with -ksp_view_pre to have it "view" the KSP before the solve so hopefully it gets that far.
>>>
>>> Please run the problem that does fit with -memory_info when the problem completes it will show the "high water mark" for PETSc allocated memory and total memory used. We first want to look at these numbers to see if it is using more memory than you expect. You could also run with say half the grid spacing to see how the memory usage scaled with the increase in grid points. Make the runs also with -log_view and send all the output from these options.
>>>
>>> Barry
>>>
>>>> On Jul 5, 2016, at 5:23 PM, frank <hengjiew at uci.edu> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am using the CG ksp solver and Multigrid preconditioner to solve a linear system in parallel.
>>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its good performance.
>>>> The petsc options file is attached.
>>>>
>>>> The domain is a 3d box.
>>>> It works well when the grid is 1536*128*384 and the process mesh is 96*8*24. When I double the size of grid and keep the same process mesh and petsc options, I get an "out of memory" error from the super-cluster I am using.
>>>> Each process has access to at least 8G memory, which should be more than enough for my application. I am sure that all the other parts of my code( except the linear solver ) do not use much memory. So I doubt if there is something wrong with the linear solver.
>>>> The error occurs before the linear system is completely solved so I don't have the info from ksp view. I am not able to re-produce the error with a smaller problem either.
>>>> In addition, I tried to use the block jacobi as the preconditioner with the same grid and same decomposition. The linear solver runs extremely slow but there is no memory error.
>>>>
>>>> How can I diagnose what exactly cause the error?
>>>> Thank you so much.
>>>>
>>>> Frank
>>>> <petsc_options.txt>
>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
-------------- next part --------------
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 7.2576e+08 max 3.8216e+05 min 3.1394e+05
Current process memory: total 7.2576e+08 max 3.8216e+05 min 3.1394e+05
Maximum (over computational time) space PetscMalloc()ed: total 6.3903e+11 max 2.7842e+08 min 2.7724e+08
Current space PetscMalloc()ed: total 1.8043e+09 max 8.0275e+05 min 7.6352e+05
========================================================================================================================
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 5 4 3328 0.
Vector 125 124 3971904 0.
Vector Scatter 25 21 60464 0.
Matrix 101 101 9462372 0.
Matrix Null Space 1 1 592 0.
Distributed Mesh 8 4 20288 0.
Star Forest Bipartite Graph 16 8 6784 0.
Discrete System 8 4 3456 0.
Index Set 55 55 277272 0.
IS L to G Mapping 8 4 27136 0.
Krylov Solver 10 10 12392 0.
DMKSP interface 6 3 1944 0.
Preconditioner 10 10 10024 0.
========================================================================================================================
-------------- next part --------------
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 5.7481e+09 max 4.5144e+05 min 3.0404e+05
Current process memory: total 5.7481e+09 max 4.5144e+05 min 3.0404e+05
Maximum (over computational time) space PetscMalloc()ed: total 4.9405e+12 max 2.6821e+08 min 2.6800e+08
Current space PetscMalloc()ed: total 5.5180e+09 max 3.0192e+05 min 2.9173e+05
========================================================================================================================
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 5 4 3328 0.
Vector 125 124 681672 0.
Vector Scatter 25 21 27256 0.
Matrix 101 101 1462180 0.
Matrix Null Space 1 1 592 0.
Distributed Mesh 8 4 20288 0.
Star Forest Bipartite Graph 16 8 6784 0.
Discrete System 8 4 3456 0.
Index Set 55 55 80872 0.
IS L to G Mapping 8 4 7080 0.
Krylov Solver 10 10 12392 0.
DMKSP interface 6 3 1944 0.
Preconditioner 10 10 10024 0.
========================================================================================================================
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 4
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left 1
-log_view
-memory_view
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 12
-mg_coarse_telescope_repart_da_processors_y 1
-mg_coarse_telescope_repart_da_processors_z 3
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type lu
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 4
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left
-log_view
-memory_view
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 24
-mg_coarse_telescope_repart_da_processors_y 2
-mg_coarse_telescope_repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type lu
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 5
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left 1
-log_view
-memory_view
-ksp_view_pre
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 24
-mg_coarse_telescope_repart_da_processors_y 2
-mg_coarse_telescope_repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type lu
More information about the petsc-users
mailing list