[petsc-users] How to measure the memory usage of the application built on the Petsc?

Barry Smith bsmith at mcs.anl.gov
Tue May 28 11:21:38 CDT 2013


First find the big items in the allocations with

/Downloads$ grep bytes PetscMallocDump | sed "s?\[ 0\]??g" | sort -n -r | more

111132656 bytes MatILUFactorSymbolic_SeqAIJ() line 1867 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
111132656 bytes MatILUFactorSymbolic_SeqAIJ() line 1840 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
2239488 bytes SpmcsDMMeshSymmetrize() line 983 in spmcsdmmesh.cpp
2239488 bytes DMSetUp_SpmcsDMMesh() line 139 in spmcsdmmeshcreate.cpp
2239488 bytes DMSetUp_SpmcsDMMesh() line 138 in spmcsdmmeshcreate.cpp
1493504 bytes MatILUFactorSymbolic_SeqAIJ() line 1867 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
1493504 bytes MatILUFactorSymbolic_SeqAIJ() line 1840 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
661408 bytes SpmcsDMMeshPreallocateSieveLabel() line 1935 in spmcsdmmesh.cpp
493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c

by far the biggest hogs are the first two MatILUFactorSymbolic_SeqAIJ followed by the original matrix storage. Then find the large on in the file 

[ 0]111132656 bytes MatILUFactorSymbolic_SeqAIJ() line 1840 in /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
      [0]  MatILUFactorSymbolic() line 6114 in /home/fdkong/math/petsc-3.3-p7/src/mat/interface/matrix.c
      [0]  PCSetUp_ILU() line 173 in /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/impls/factor/ilu/ilu.c
      [0]  PCSetUp() line 810 in /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/interface/precon.c
      [0]  KSPSetUp() line 182 in /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
      [0]  PCSetUpOnBlocks_ASM() line 416 in /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/impls/asm/asm.c
      [0]  PCSetUpOnBlocks() line 861 in /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/interface/precon.c
      [0]  KSPSetUpOnBlocks() line 151 in /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
      [0]  KSPSolve() line 351 in /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
      [0]  PCGMGMCycle_Private() line 20 in gmg.cpp
      [0]  PCApply_GMG() line 329 in gmg.cpp
      [0]  PCApply() line 373 in /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/interface/precon.c
      [0]  KSPFGMRESCycle() line 114 in /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
      [0]  KSPSolve_FGMRES() line 277 in /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
      [0]  KSPSolve() line 351 in /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c

so it is doing the ILU on each block (process) that is taking all the space. Since the ILU is taking much more than the matrix I'm guessing you are running ILU(k > 0) which will require a lot of memory.  You can try ILU(0), the default, and it should require much less memory. Or you can use something like -sub_pc_type sor so that it does not need to allocate any factored matrices at all.

   Barry


On May 28, 2013, at 7:42 AM, Fande Kong <fande.kong at colorado.edu> wrote:

> Hi Matthew,
> 
> Thanks,
> 
> I added the function PetscMallocDump() into the code after calling KSPSolve():
> 
>   (6) after calling KSPSolve()
> 
>   ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Current process memory %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Max process memory %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMallocDump(PETSC_NULL);CHKERRQ(ierr);
> 
>  Current space PetscMalloc()ed 290.952 M
>  Max space PetscMalloced() 593.367 M
>  Current process memory 306.852 M
>  Max process memory 301.441 M
> 
> The printed detailed petscmalloc information is attached. The output seems too many lines to understand.  How to understand this information?
> 
> 
> 
> On Tue, May 28, 2013 at 6:05 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Tue, May 28, 2013 at 5:54 AM, Fande Kong <Fande.Kong at colorado.edu> wrote:
> Hi Smith,
> 
> Thank you very much. According to your suggestions and information, I added these functions into my code to measure the memory usage. Now I am confused, since the small problem needs large memory.
> 
> I added the function PetscMemorySetGetMaximumUsage()  immediately after PetscInitialize(). And then I added the following code into several positions in the code (before & after setting up unstructured mesh, before & after KSPSetUp(), before & after KSPSolve(), and Destroy all stuffs):
> 
>    PetscLogDouble space =0;
>   ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Current process memory %G M\n", space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Max process memory %G M\n", space/(1024*1024));CHKERRQ(ierr);
> 
> 
> In order to measure the memory usage, I just used only one core (mpirun -n 1 ./program ) to solve a small problem with 12691 mesh nodes (the freedom is about 12691*3= 4 *10^4 ). I solve the linear elasticity problem by using FGMRES preconditioned by multigrid method (PCMG). I use all petsc standard routines except that I construct coarse matrix and interpolation matrix by myself. I used the following run script to set up solver and preconditioner:
> 
> mpirun -n 1 ./linearElasticity  -ksp_type fgmres -pc_type mg  -pc_mg_levels 2 -pc_mg_cycle_type v -pc_mg_type multiplicative -mg_levels_1_ksp_type richardson -mg_levels_1_ksp_max_it 1 -mg_levels_1_pc_type asm -mg_levels_1_sub_ksp_type preonly -mg_levels_1_sub_pc_type ilu -mg_levels_1_sub_pc_factor_levels 4 -mg_levels_1_sub_pc_factor_mat_ordering_type rcm -mg_coarse_ksp_type cg -mg_coarse_ksp_rtol 0.1  -mg_coarse_ksp_max_it 10 -mg_coarse_pc_type asm -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type ilu -mg_coarse_sub_pc_factor_levels 2 -mg_coarse_sub_pc_factor_mat_ordering_type rcm -ksp_view    -log_summary  -pc_mg_log
>     
> 
>  I got the following results:
> 
> (1) before setting up mesh,
> 
> Current space PetscMalloc()ed 0.075882 M
> Max space PetscMalloced() 0.119675 M
> Current process memory 7.83203 M
> Max process memory 0 M
> 
> (2) after setting up mesh,
> 
> Current space PetscMalloc()ed 16.8411 M
> Max space PetscMalloced() 22.1353 M
> Current process memory 28.4336 M
> Max process memory 33.0547 M
> 
> (3) before calling KSPSetUp()
> 
> Current space PetscMalloc()ed 16.868 M
> Max space PetscMalloced() 22.1353 M
> Current process memory 28.6914 M
> Max process memory 33.0547 M
> 
> 
> (4) after calling KSPSetUp()
> 
> Current space PetscMalloc()ed 74.3354 M
> Max space PetscMalloced() 74.3355 M
> 
> This makes sense. It is 20M for your mesh, 20M
> for the Krylov space on the fine level, and I am guessing
> 35M for the Jacobian and the ILU factors.
>  
> Current process memory 85.6953 M
> Max process memory 84.9258 M
> 
> (5) before calling KSPSolve()
> 
> Current space PetscMalloc()ed 74.3354 M
> Max space PetscMalloced() 74.3355 M
> Current process memory 85.8711 M
> Max process memory 84.9258 M
>  
> (6) after calling KSPSolve()
> 
> The question is what was malloc'd here. There is no way we could
> tell without seeing the code and probably running it. I suggest
> using http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscMallocDump.html
> to see what was allocated. The solvers tend not to allocated during
> the solve, as that is slow. So I would be inclined to check user code first.
> 
>    Matt
>  
> Current space PetscMalloc()ed 290.952 M
> Max space PetscMalloced() 593.367 M
> Current process memory 306.852 M
> Max process memory 301.441 M
> 
> (7) After destroying all stuffs 
> 
> Current space PetscMalloc()ed 0.331482 M
> Max space PetscMalloced() 593.367 M
> Current process memory 67.2539 M
> Max process memory 309.137 M
> 
> 
> So my question is why/if I need so much memory (306.852 M) for so small problem (freedom: 4*10^4). Or is it normal case? Or my run script used to set up solver is not reasonable?
> 
> 
> Regards,
> 
> Fande Kong,
> 
> Department of Computer Science
> University of Colorado Boulder
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, May 27, 2013 at 9:48 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    There are several ways to monitor the memory usage. You can divide them into two categories: those that monitor how much memory has been malloced specifically by PETSc and how much is used totally be the process.
> 
> PetscMallocGetCurrentUsage() and PetscMallocGetMaximumUsage() which only work with the command line option -malloc provide how much PETSc has malloced.
> 
> PetscMemoryGetCurrentUsage() and PetscMemoryGetMaximumUsage() (call PetscMemorySetGetMaximumUsage() immediately after PetscInitialize() for this one to work) provide total memory usage.
> 
> These are called on each process so use a MPI_Reduce() to gather the total memory across all processes to process 0 to print it out. Suggest calling it after the mesh as been set up, then call again immediately before the XXXSolve() is called and then after the XXXSolve() is called.
> 
>    Please let us know if you have any difficulties.
> 
>     As always we recommend you upgrade to PETSc 3.4
> 
>     Barry
> 
> 
> 
> On May 27, 2013, at 10:22 PM, Fande Kong <fande.kong at colorado.edu> wrote:
> 
> > Hi all,
> >
> > How to measure the memory usage of the application built on the Petsc?  I am now solving linear elasticity equations with fgmres preconditioned by two-level method, that is, preconditioned by multigrid method where on each level the additive Schwarz method is adopted.  More than 1000 cores are adopted to solve this problem on the supercomputer. When the total freedom of the problem is about 60M, the application correctly run and produce correct results. But when the total freedom increases to 600M, the application abort and say there is not enough memory (  the system administrator of the supercomputer told me that my application run out memory).
> >
> > Thus, I want to monitor the memory usage dynamically when the application running. Are there any functions or strategies that could be used for this purpose?
> >
> > The error information is attached.
> >
> > Regards,
> > --
> > Fande Kong
> > Department of Computer Science
> > University of Colorado at Boulder
> > <solid3dcube2.o1603352><configure and make log.zip>
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> 
> 
> -- 
> Fande Kong
> Department of Computer Science
> University of Colorado at Boulder
> <PetscMallocDum_printed.zip>



More information about the petsc-users mailing list