[petsc-dev] EXTERNAL: Re: Do you have any suggestion in order to reduce my memory footprint?
Barry Smith
bsmith at mcs.anl.gov
Wed Oct 22 18:32:45 CDT 2014
> On Oct 22, 2014, at 2:58 PM, Alletto, John M <john.m.alletto at lmco.com> wrote:
>
> Hi Barry,
>
> When I get my code back up and running and execute the -ksp_view how will I recognize it?
You’ll know it when you see it.
>
> I am new to PETSc, somehow my code broke, working on it.
> I am building off of example ./ex45
> I do use a stw = 2 and call DMDACreate3d( ....dof,stw, 0, 0, 0, &da
Box or star stencil? The box requires a HUGE amount more memory.
So a stencil width of 2, so a higher order finite difference scheme? This will make the matrices have many more nonzeros in them then using a stencil width of one and thus use a lot more memory. Plus for the best multigrid convergence you may need more then the default piecewise linear interpolation between levels. Or you might consider using the 4th order scheme on the finest mesh and 2nd order operators on all the other levels.
>
> I am set options as follows
> -da_refine 3
> -pc_type mg
> -pc_mg_type full
> KSPSOLVER is set to KSPLMGRES
Do you mean KSPLGMRES? You could skip the Krylov solver completely and use -ksp_type richardson to save the storage of the GMRES vectors (this may slow the convergence a little bit).
By default the course grid is solve with a direct solver so the smaller you have for a course grid the less “extra” memory usage for it.
>
>>
>> I have a 64 bit linux machine with 256 GByte of memory and 30 processors.
Depending on the machines details you will likely get best performance using about 1/2 the cores. In the PETSc root directory run
make streams NPMAX=32 and send us the results
>> I am solving Laplace equations and the current solver I am using is taking every bit of memory I have available.
>>
>> Do you have any suggestions (such as changing precision or changing solvers or something else) in order to reduce my memory footprint?
What else is in the simulation? Surely you are not __just__ solving a Laplace equation. Is the rest of the simulation done explicitly so there are no matrices for it? Note in this case the errors from the operator splitting may be so large as to remove any benefit of using a 4th order discretization for Laplace.
I made some runs on my laptop to indicate how large a problem I can do on my laptop with 4 gigabytes of memory (note that each refinement of the mesh will require roughly 16 times more memory)
~/Src/petsc/src/ksp/ksp/examples/tutorials (maint=)
$ ./ex45 -da_refine 3 -pc_type mg -pc_mg_type full -ksp_monitor -ksp_type richardson -memory_info
0 KSP Residual norm 3.264230720534e+02
1 KSP Residual norm 2.731900801719e+01
2 KSP Residual norm 3.524392472835e-01
3 KSP Residual norm 5.847297338171e-03
4 KSP Residual norm 1.932667627422e-04
Residual norm 1.18508e-05
Summary of Memory Usage in PETSc
[0]Current space PetscMalloc()ed 48240, max space PetscMalloced() 5.20534e+07
[0]Current process memory 6.18824e+07 max process memory 6.18824e+07
~/Src/petsc/src/ksp/ksp/examples/tutorials (maint=)
$ ./ex45 -da_refine 3 -pc_type mg -pc_mg_type full -ksp_monitor -ksp_type gmres -memory_info
0 KSP Residual norm 3.264230720534e+02
1 KSP Residual norm 1.667893748550e+01
2 KSP Residual norm 3.851411754724e-02
3 KSP Residual norm 7.837312289688e-04
Residual norm 4.7335e-05
Summary of Memory Usage in PETSc
[0]Current space PetscMalloc()ed 48240, max space PetscMalloced() 6.43328e+07
[0]Current process memory 7.44161e+07 max process memory 7.44161e+07
~/Src/petsc/src/ksp/ksp/examples/tutorials (maint=)
$ ./ex45 -da_refine 5 -pc_type mg -pc_mg_type full -ksp_monitor -ksp_type gmres -memory_info
0 KSP Residual norm 2.461787435384e+03
1 KSP Residual norm 9.812293053603e+01
2 KSP Residual norm 2.213037728680e-01
3 KSP Residual norm 4.320723869852e-03
Residual norm 5.13901e-05
Summary of Memory Usage in PETSc
[0]Current space PetscMalloc()ed 49840, max space PetscMalloced() 3.89334e+09
[0]Current process memory 3.91453e+09 max process memory 3.91453e+09
~/Src/petsc/src/ksp/ksp/examples/tutorials (maint=)
$ ./ex45 -da_refine 5 -pc_type mg -pc_mg_type full -ksp_monitor -ksp_type richardson -memory_info
0 KSP Residual norm 2.461787435384e+03
1 KSP Residual norm 2.819553940156e+02
2 KSP Residual norm 3.664748917714e+00
3 KSP Residual norm 5.791285265094e-02
4 KSP Residual norm 1.333000775504e-03
Residual norm 7.3202e-06
Summary of Memory Usage in PETSc
[0]Current space PetscMalloc()ed 49840, max space PetscMalloced() 3.14564e+09
[0]Current process memory 3.16655e+09 max process memory 3.16655e+09
~/Src/petsc/src/ksp/ksp/examples/tutorials (maint=)
$ ./ex45 -da_refine 5 -pc_type mg -pc_mg_type full -ksp_monitor -ksp_type richardson -memory_info -ksp_view
0 KSP Residual norm 2.461787435384e+03
1 KSP Residual norm 2.819553940156e+02
2 KSP Residual norm 3.664748917714e+00
3 KSP Residual norm 5.791285265094e-02
4 KSP Residual norm 1.333000775504e-03
KSP Object: 1 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=10000
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: mg
MG: type is FULL, levels=6 cycles=v
Not using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 1 MPI processes
type: preonly
maximum iterations=1
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5, needed 6.29283
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=343, cols=343
package used to perform factorization: petsc
total: nonzeros=13259, allocated nonzeros=13259
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=343, cols=343
total: nonzeros=2107, allocated nonzeros=2107
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999994, max = 1.09999
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_1_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=2197, cols=2197
total: nonzeros=14365, allocated nonzeros=14365
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=2197, cols=2197
total: nonzeros=14365, allocated nonzeros=14365
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999989, max = 1.09999
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_2_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=15625, cols=15625
total: nonzeros=105625, allocated nonzeros=105625
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=15625, cols=15625
total: nonzeros=105625, allocated nonzeros=105625
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999994, max = 1.09999
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_3_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=117649, cols=117649
total: nonzeros=809137, allocated nonzeros=809137
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=117649, cols=117649
total: nonzeros=809137, allocated nonzeros=809137
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999997, max = 1.1
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_4_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=912673, cols=912673
total: nonzeros=6.33226e+06, allocated nonzeros=6.33226e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=912673, cols=912673
total: nonzeros=6.33226e+06, allocated nonzeros=6.33226e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 5 -------------------------------
KSP Object: (mg_levels_5_) 1 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.0999998, max = 1.1
Chebyshev: estimated using: [0 0.1; 0 1.1]
KSP Object: (mg_levels_5_est_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_5_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=7189057, cols=7189057
total: nonzeros=5.00999e+07, allocated nonzeros=5.00999e+07
total number of mallocs used during MatSetValues calls =0
not using I-node routines
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_5_) 1 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=7189057, cols=7189057
total: nonzeros=5.00999e+07, allocated nonzeros=5.00999e+07
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=7189057, cols=7189057
total: nonzeros=5.00999e+07, allocated nonzeros=5.00999e+07
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Residual norm 7.3202e-06
Note that running this way stores explicitly the sparse matrix on each level and the interpolation matrix between levels.
With the 7 point stencil the matrix has 7 nonzeros per row. Using the Compressed sparse row storage format requires an integer for each non-zero in the matrix thus storing the matrix requires the same memory as roughly 10.5 vectors; while the interpolation matrix likely requires maybe the space of 3 vectors. Now each level requires a matrix plus an interpolation (except the coarse level doesn’t need an interpolation) so lets just say 14 vectors per level; if the multigrid algorithm requires 3 vectors per level (for right hand side, solution, and residual computation) then the matrices are taking say 3 times the memory of the vectors.
If it is a simple constant coefficient operator then one can, in theory, not store the matrices at all, seemingly freeing up a lot of memory (but at the cost of writing or borrowing a custom code). Sounds great but now let’s ask the question how many more levels could you solve with if you didn’t store the matrices. Well if you refine 1 more time the size of the new vector will be 16 times the size of the previous vector. This will swamp the previous memory used for matrices so my conclusion is that going matrix free will allow at most one more level of refinement. So ask your self is it worth spending a ton of time writing a matrix free multigrid solver that will only let you get to one more level of refinement? Note that my estimates are for the 7 point stencil, if you are using a double wide stencil and 4th order differencing in your matrix, the matrices will take a lot more memory so maybe you can get a couple more levels of refinement.
Barry
>>
>> Many Thanks
>> John
>
More information about the petsc-dev
mailing list