[petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM)

Smith, Barry F. bsmith at mcs.anl.gov
Wed Feb 5 14:16:32 CST 2020



> On Feb 5, 2020, at 9:03 AM, Дмитрий Мельничук <dmitry.melnichuk at geosteertech.com> wrote:
> 
> Barry, appreciate your response, as always.
>  
> - You are saying that I am using ASM + ILU(0). However, I use PETSc only with "ASM" as the input parameter for preconditioner. Does it mean that ILU(0) is default sub-preconditioner for ASM?

   Yes

> Can I change it using the option "-sub_pc_type"?

  Yes -sub_pc_type for   then it will use SOR on each block instead of ILU saves a matrix.

> Does it make sense to you within the scope of my general goal, which is memory consumption decrease? Can it be useful to vary the "-sub_ksp_type" option?

   Probably not.
>  
> - I have run the computation for the same initial matrix with the "-sub_pc_factor_in_place" option, PC = ASM. Now the process consumed ~400 MB comparing to 550 MB without this option.

   This is what I expected, good.

> I used "-ksp_view" for this computation, two logs for this computation are attached:
> "ksp_view.txt"  - ksp_view option only
> "full_log_ASM_factor_in_place.txt" - full log without ksp_view option
>  
> - Then I changed primary preconditioner from ASM to ILU(0) and ran the computation: memory consumption was again about ~400 MB, no matter if I use the "-sub_pc_factor_in_place" option.
>  
> - Then I tried to run the computation with ILU(0) and "-pc_factor_in_place", just in case: the computation did not start, I got an error message, the log is attached: "Error_ilu_pc_factor.txt"

   Since that matrix is used for the MatMuilt you cannot do the factorization in  place since it replaces the original matrix entries with the factorization matrix entries


>  
> - Then I ran the computation with SOR as a preconditioner. PETSc gave me an error message, the log is attached: "Error_gmres_sor.txt"

   This is because our SOR cannot handle zeros on the diagonal.

>  
> - As for the kind of PDEs: I am solving the standard poroelasticity problem, the formulation can be found in the attached paper (Zheng_poroelasticity.pdf), pages 2-3.
> The file PDE.jpg is a snapshot of PDEs from this paper.
>  
>  
> So, if you may give me any further advice on how to decrease the consumed amount of memory to approximately the matrix size (~200 MB in this case), it will be great. Do I need to focus on searching a proper preconditioner? BTW, the single ILU(0) did not give me any memory advantage comparing to ASM with "-sub_pc_factor_in_place".

   Yes, because in both cases you need two copies of the matrix, for the multiple and for the ILU. But you want a preconditioner that doesn't require any new matrices in the preconditioner. This is difficult. You want an efficient preconditioner that requires essentially no additional memory?

   -ksp_type gmres  or bcgs -pc_type jacobi   (the sor won't work because the zero diagonals)   It will not be good preconditioner. Are you sure you don't have additional memory for the preconditioner? A good preconditioner  might require up to 5  to 6 the memory of the original matrix.


>  
> Have a pleasant day!
>  
> Kind regards,
> Dmitry
>  
>  
>  
> 04.02.2020, 19:04, "Smith, Barry F." <bsmith at mcs.anl.gov>:
> 
>    Please run with the option -ksp_view so we know the exact solver options you are using.
> 
>    From the lines
> 
> MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> 
>    and the fact you have three matrices I would guess you are using the additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which converges the same as ILU on one process but does use much more memory).
> 
>    Note: your code is still built with 32 bit integers.
> 
>    I would guess the basic matrix formed plus the vectors in this example could take ~200 MB. It is the two matrices in the additive Schwarz that is taking the additional memory.
> 
>    What kind of PDEs are you solving and what kind of formulation?
> 
>    ASM plus ILU is the "work mans" type preconditioner, relatively robust but not particularly fast for convergence. Depending on your problem you might be able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on one of the splits. In your own run you see the ILU is chugging along rather slowly to the solution.
> 
>    With your current solvers you can use the option -sub_pc_factor_in_place which will shave off one of the matrices memories. Please try that.
> 
>    Avoiding the ASM you can avoid both extra matrices but at the cost of even slower convergence. Use, for example -pc_type sor
> 
> 
>     The petroleum industry also has a variety of "custom" preconditioners/solvers for particular models and formulations that can beat the convergence of general purpose solvers; and require less memory. Some of these can be implemented or simulated with PETSc. Some of these are implemented in the commercial petroleum simulation codes and it can be difficult to get a handle on exactly what they do because of proprietary issues. I think I have an old text on these approaches in my office, there may be modern books that discuss these.
> 
> 
>    Barry
> 
> 
>  
> 
>  On Feb 4, 2020, at 6:04 AM, Дмитрий Мельничук <dmitry.melnichuk at geosteertech.com> wrote:
> 
>  Hello again!
>  Thank you very much for your replies!
>  Log is attached.
> 
>  1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB.
>  I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB.
>  Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix.
> 
>  So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways?
> 
>  I will be very grateful for any thoughts!
> 
>  2. The second problem is more particular.
>  According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM while solving the system of linear equations.
>  As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is allocated we get only 517 MB.
> 
>  Thank you again for your time! Have a nice day.
> 
>  Kind regards,
>  Dmitry
> 
> 
>  03.02.2020, 19:55, "Smith, Barry F." <bsmith at mcs.anl.gov>:
> 
>     GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly.
> 
>      Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not.
> 
>     If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways.
> 
>     Barry
> 
> 
> 
>   On Feb 3, 2020, at 10:34 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
>   On Mon, Feb 3, 2020 at 10:38 AM Дмитрий Мельничук <dmitry.melnichuk at geosteertech.com> wrote:
>   Hello all!
> 
>   Now I am faced with a problem associated with the memory allocation when calling of KSPSolve .
> 
>   GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen.
>   According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling).
>   But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling).
>   Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells.
> 
>   1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2
> 
>   2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner.
>       You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap
>       and you might have fill-in depending on the solver.
> 
>   3) The massif tool from valgrind is a good fine-grained way to look at memory allocation
> 
>     Thanks,
> 
>        Matt
> 
>   Is there a way to decrease amout of allocated memory?
>   Is that an expected behaviour for GMRES-ASM combination?
> 
>   As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing.
> 
>   ...
>   Vec :: Vec_F, Vec_U
>   Mat :: Mat_K
>   ...
>   ...
>   call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr)
>   call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)
>   ....
>   call VecAssemblyBegin(Vec_F_mod,ierr)
>   call VecAssemblyEnd(Vec_F_mod,ierr)
>   ...
>   ...
>   call PetscMallocGetCurrentUsage(mem, ierr)
>   print *,"Memory used: ",mem
>   ...
>   ...
>   call KSPSetType(Krylov,KSPGMRES,ierr)
>   call KSPGetPC(Krylov,PreCon,ierr)
>   call PCSetType(PreCon,PCASM,ierr)
>   call KSPSetFromOptions(Krylov,ierr)
>   ...
>   call KSPSolve(Krylov,Vec_F,Vec_U,ierr)
>   ...
>   ...
>   options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason"
> 
> 
>   Kind regards,
>   Dmitry Melnichuk
>   Matrix.dat (265288024)
> 
> 
>   --
>   What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>   -- Norbert Wiener
> 
>   https://www.cse.buffalo.edu/~knepley/
> 
> 
>  <Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit>
>  
> <ksp_view.txt><PDE.JPG><Zheng_poroelasticity.pdf><full_log_ASM_factor_in_place.txt><Error_gmres_sor.txt><Error_ilu_pc_factor.txt>



More information about the petsc-users mailing list