[petsc-users] Question about memory usage in Multigrid preconditioner
frank
hengjiew at uci.edu
Mon Jul 11 12:14:12 CDT 2016
Hi Dave,
I re-run the test using bjacobi as the preconditioner on the coarse mesh
of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The
petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred before the
linear solver finished one step. So I don't have the full info from
ksp_view. The info from ksp_view_pre is attached.
It seems to me that the error occurred when the decomposition was going
to be changed.
I had another test with a grid of 1536*128*384 and the same process mesh
as above. There was no error. The ksp_view info is attached for comparison.
Thank you.
Frank
On 07/08/2016 10:38 PM, Dave May wrote:
>
>
> On Saturday, 9 July 2016, frank <hengjiew at uci.edu
> <javascript:_e(%7B%7D,'cvml','hengjiew at uci.edu');>> wrote:
>
> Hi Barry and Dave,
>
> Thank both of you for the advice.
>
> @Barry
> I made a mistake in the file names in last email. I attached the
> correct files this time.
> For all the three tests, 'Telescope' is used as the coarse
> preconditioner.
>
> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12
> Part of the memory usage: Vector 125 124 3971904 0.
> Matrix 101 101
> 9462372 0
>
> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24
> Part of the memory usage: Vector 125 124 681672 0.
> Matrix 101 101
> 1462180 0.
>
> In theory, the memory usage in Test1 should be 8 times of Test2.
> In my case, it is about 6 times.
>
> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain
> per process: 32*32*32
> Here I get the out of memory error.
>
> I tried to use -mg_coarse jacobi. In this way, I don't need to set
> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
> The linear solver didn't work in this case. Petsc output some errors.
>
> @Dave
> In test3, I use only one instance of 'Telescope'. On the coarse
> mesh of 'Telescope', I used LU as the preconditioner instead of SVD.
> If my set the levels correctly, then on the last coarse mesh of MG
> where it calls 'Telescope', the sub-domain per process is 2*2*2.
> On the last coarse mesh of 'Telescope', there is only one grid
> point per process.
> I still got the OOM error. The detailed petsc option file is attached.
>
>
> Do you understand the expected memory usage for the
> particular parallel LU implementation you are using? I don't
> (seriously). Replace LU with bjacobi and re-run this test. My point
> about solver debugging is still valid.
>
> And please send the result of KSPView so we can see what is actually
> used in the computations
>
> Thanks
> Dave
>
>
>
> Thank you so much.
>
> Frank
>
>
>
> On 07/06/2016 02:51 PM, Barry Smith wrote:
>
> On Jul 6, 2016, at 4:19 PM, frank <hengjiew at uci.edu> wrote:
>
> Hi Barry,
>
> Thank you for you advice.
> I tried three test. In the 1st test, the grid is
> 3072*256*768 and the process mesh is 96*8*24.
> The linear solver is 'cg' the preconditioner is 'mg' and
> 'telescope' is used as the preconditioner at the coarse mesh.
> The system gives me the "Out of Memory" error before the
> linear system is completely solved.
> The info from '-ksp_view_pre' is attached. I seems to me
> that the error occurs when it reaches the coarse mesh.
>
> The 2nd test uses a grid of 1536*128*384 and process mesh
> is 96*8*24. The 3rd test uses the same grid but a
> different process mesh 48*4*12.
>
> Are you sure this is right? The total matrix and vector
> memory usage goes from 2nd test
> Vector 384 383 8,193,712 0.
> Matrix 103 103 11,508,688 0.
> to 3rd test
> Vector 384 383 1,590,520 0.
> Matrix 103 103 3,508,664 0.
> that is the memory usage got smaller but if you have only
> 1/8th the processes and the same grid it should have gotten
> about 8 times bigger. Did you maybe cut the grid by a factor
> of 8 also? If so that still doesn't explain it because the
> memory usage changed by a factor of 5 something for the
> vectors and 3 something for the matrices.
>
>
> The linear solver and petsc options in 2nd and 3rd tests
> are the same in 1st test. The linear solver works fine in
> both test.
> I attached the memory usage of the 2nd and 3rd tests. The
> memory info is from the option '-log_summary'. I tried to
> use '-momery_info' as you suggested, but in my case petsc
> treated it as an unused option. It output nothing about
> the memory. Do I need to add sth to my code so I can use
> '-memory_info'?
>
> Sorry, my mistake the option is -memory_view
>
> Can you run the one case with -memory_view and -mg_coarse
> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to
> see how much memory is used without the telescope? Also run
> case 2 the same way.
>
> Barry
>
>
>
> In both tests the memory usage is not large.
>
> It seems to me that it might be the 'telescope'
> preconditioner that allocated a lot of memory and caused
> the error in the 1st test.
> Is there is a way to show how much memory it allocated?
>
> Frank
>
> On 07/05/2016 03:37 PM, Barry Smith wrote:
>
> Frank,
>
> You can run with -ksp_view_pre to have it "view"
> the KSP before the solve so hopefully it gets that far.
>
> Please run the problem that does fit with
> -memory_info when the problem completes it will show
> the "high water mark" for PETSc allocated memory and
> total memory used. We first want to look at these
> numbers to see if it is using more memory than you
> expect. You could also run with say half the grid
> spacing to see how the memory usage scaled with the
> increase in grid points. Make the runs also with
> -log_view and send all the output from these options.
>
> Barry
>
> On Jul 5, 2016, at 5:23 PM, frank
> <hengjiew at uci.edu> wrote:
>
> Hi,
>
> I am using the CG ksp solver and Multigrid
> preconditioner to solve a linear system in parallel.
> I chose to use the 'Telescope' as the
> preconditioner on the coarse mesh for its good
> performance.
> The petsc options file is attached.
>
> The domain is a 3d box.
> It works well when the grid is 1536*128*384 and
> the process mesh is 96*8*24. When I double the
> size of grid and keep the same process mesh and
> petsc options, I get an "out of memory" error from
> the super-cluster I am using.
> Each process has access to at least 8G memory,
> which should be more than enough for my
> application. I am sure that all the other parts of
> my code( except the linear solver ) do not use
> much memory. So I doubt if there is something
> wrong with the linear solver.
> The error occurs before the linear system is
> completely solved so I don't have the info from
> ksp view. I am not able to re-produce the error
> with a smaller problem either.
> In addition, I tried to use the block jacobi as
> the preconditioner with the same grid and same
> decomposition. The linear solver runs extremely
> slow but there is no memory error.
>
> How can I diagnose what exactly cause the error?
> Thank you so much.
>
> Frank
> <petsc_options.txt>
>
> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160711/5fee2a2b/attachment-0001.html>
-------------- next part --------------
KSP Object: 18432 MPI processes
type: cg
maximum iterations=1
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
type: mg
PC has not been set up so information may be incomplete
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 18432 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using DEFAULT norm type for convergence test
PC Object: (mg_coarse_) 18432 MPI processes
type: redundant
PC has not been set up so information may be incomplete
Redundant preconditioner: Not yet setup
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 18432 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0., max = 0.
maximum iterations=2, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 18432 MPI processes
type: sor
PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=603979776, cols=603979776
total: nonzeros=4223139840, allocated nonzeros=4223139840
total number of mallocs used during MatSetValues calls =0
has attached null space
[NID 00631] 2016-07-10 06:22:58 Apid 45768056: initiated application termination
[NID 06277] 2016-07-10 06:23:00 Apid 45768056: OOM killer terminated this process.
[NID 06235] 2016-07-10 06:23:00 Apid 45768056: OOM killer terminated this process.
-------------- next part --------------
-ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_lag_norm
-ksp_rtol 1e-7
-ksp_initial_guess_nonzero yes
-ksp_converged_reason
-ppe_max_iter 50
-pc_type mg
-pc_mg_galerkin
-pc_mg_levels 5
-mg_levels_ksp_type richardson
-mg_levels_ksp_max_it 1
-ksp_max_it 1
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type telescope
-mg_coarse_pc_telescope_reduction_factor 64
-options_left 1
-log_view
-memory_view
-ksp_view_pre
# Setting dmdarepart on subcomm
-mg_coarse_telescope_repart_da_processors_x 24
-mg_coarse_telescope_repart_da_processors_y 2
-mg_coarse_telescope_repart_da_processors_z 6
-mg_coarse_telescope_ksp_type preonly
-mg_coarse_telescope_pc_type mg
-mg_coarse_telescope_pc_mg_galerkin
-mg_coarse_telescope_pc_mg_levels 4
-mg_coarse_telescope_mg_levels_ksp_max_it 1
-mg_coarse_telescope_mg_levels_ksp_type richardson
-mg_coarse_telescope_mg_coarse_ksp_type preonly
-mg_coarse_telescope_mg_coarse_pc_type bjacobi
-------------- next part --------------
KSP Object: 18432 MPI processes
type: cg
maximum iterations=1
tolerances: relative=1e-07, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 18432 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 18432 MPI processes
type: telescope
Telescope: parent comm size reduction factor = 64
Telescope: comm_size = 18432 , subcomm_size = 288
Telescope: DMDA detected
DMDA Object: (mg_coarse_telescope_repart_) 288 MPI processes
M 192 N 16 P 48 m 24 n 2 p 6 dof 1 overlap 1
KSP Object: (mg_coarse_telescope_) 288 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_) 288 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_telescope_mg_coarse_) 288 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_coarse_) 288 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 288 MPI processes
type: mpiaij
rows=288, cols=288
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 18 x npcol 16
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots FALSE
Use iterative refinement FALSE
Processors in row 18 col partition 16
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=288, cols=288
total: nonzeros=1728, allocated nonzeros=1728
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_1_) 288 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_1_) 288 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=2304, cols=2304
total: nonzeros=14976, allocated nonzeros=14976
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_2_) 288 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_2_) 288 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=18432, cols=18432
total: nonzeros=124416, allocated nonzeros=124416
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_coarse_telescope_mg_levels_3_) 288 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_telescope_mg_levels_3_) 288 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=147456, cols=147456
total: nonzeros=1013760, allocated nonzeros=1013760
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 288 MPI processes
type: mpiaij
rows=147456, cols=147456
total: nonzeros=1013760, allocated nonzeros=1013760
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=147456, cols=147456
total: nonzeros=1013760, allocated nonzeros=1013760
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 18432 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 18432 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=1179648, cols=1179648
total: nonzeros=8183808, allocated nonzeros=8183808
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 18432 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 18432 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=9437184, cols=9437184
total: nonzeros=65765376, allocated nonzeros=65765376
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 18432 MPI processes
type: richardson
Richardson: damping factor=1.
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 18432 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 18432 MPI processes
type: mpiaij
rows=75497472, cols=75497472
total: nonzeros=527302656, allocated nonzeros=527302656
total number of mallocs used during MatSetValues calls =0
has attached null space
More information about the petsc-users
mailing list