[petsc-users] How to measure the memory usage of the application built on the Petsc?

Fande Kong fande.kong at colorado.edu
Wed May 29 07:45:10 CDT 2013


Hi Barry,

Thanks.

Now I know where it takes most of memory. But I have another question. I
found that we need many MatSeqAIJSetPreallocations in code. I do not know
why. Then I switch to test a standard example built in petsc:
petsc-3.3-p7/src/ksp/ksp/examples/tutorials/ex29.c. First, I added the
following code in to ex29.c after calling KSPSolve:


PetscLogDouble space =0;
    ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
    ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n",
space/(1024*1024));CHKERRQ(ierr);
    ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
    ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n",
space/(1024*1024));CHKERRQ(ierr);
    ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
    ierr =  PetscPrintf(comm,"Current process memory %G M\n",
space/(1024*1024));CHKERRQ(ierr);
    ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
    ierr =  PetscPrintf(comm,"Max process memory %G M\n",
space/(1024*1024));CHKERRQ(ierr);
    ierr =  PetscMallocDump(PETSC_NULL);CHKERRQ(ierr);


Second, ran the following script:

mpirun -n 1 ./ex29 -pc_type mg -pc_mg_type full -ksp_type fgmres
-ksp_monitor_short   -da_refine 1 >printresult

Last, I got:


5279056 bytes MatLUFactorSymbolic_SeqAIJ() line 380 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
5279040 bytes MatLUFactorSymbolic_SeqAIJ() line 392 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
2592848 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
2592848 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
1167392 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
1167392 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
651264 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
651264 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
/home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c
520208 bytes VecCreate_Seq() line 40 in
/home/fdkong/math/petsc-3.3-p7/src/vec/vec/impls/seq/bvec3.c


Now, my question is why we call MatSeqAIJSetPreallocation so many tines.
Here we  use two-level preconditioner, then we only need three
MatSeqAIJSetPreallocations, one for fine matrix, one for coarse matrix and
one for interpolation matrix (restriction is the transpose of the
interpolation).  But here we have more than three
MatSeqAIJSetPreallocations. It is also similar for MatLUFactorSymbolic.




On Tue, May 28, 2013 at 10:21 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> First find the big items in the allocations with
>
> /Downloads$ grep bytes PetscMallocDump | sed "s?\[ 0\]??g" | sort -n -r |
> more
>
> 111132656 bytes MatILUFactorSymbolic_SeqAIJ() line 1867 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
> 111132656 bytes MatILUFactorSymbolic_SeqAIJ() line 1840 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
> 10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 10980656 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 2239488 bytes SpmcsDMMeshSymmetrize() line 983 in spmcsdmmesh.cpp
> 2239488 bytes DMSetUp_SpmcsDMMesh() line 139 in spmcsdmmeshcreate.cpp
> 2239488 bytes DMSetUp_SpmcsDMMesh() line 138 in spmcsdmmeshcreate.cpp
> 1493504 bytes MatILUFactorSymbolic_SeqAIJ() line 1867 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
> 1493504 bytes MatILUFactorSymbolic_SeqAIJ() line 1840 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
> 1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 1031520 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsSectionSetChart() line 183 in spmcssectionimpl.cpp
> 661408 bytes SpmcsDMMeshPreallocateSieveLabel() line 1935 in
> spmcsdmmesh.cpp
> 493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
> 493856 bytes MatSeqAIJSetPreallocation_SeqAIJ() line 3439 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aij.c
>
> by far the biggest hogs are the first two MatILUFactorSymbolic_SeqAIJ
> followed by the original matrix storage. Then find the large on in the file
>
> [ 0]111132656 bytes MatILUFactorSymbolic_SeqAIJ() line 1840 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/impls/aij/seq/aijfact.c
>       [0]  MatILUFactorSymbolic() line 6114 in
> /home/fdkong/math/petsc-3.3-p7/src/mat/interface/matrix.c
>       [0]  PCSetUp_ILU() line 173 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/impls/factor/ilu/ilu.c
>       [0]  PCSetUp() line 810 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/interface/precon.c
>       [0]  KSPSetUp() line 182 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
>       [0]  PCSetUpOnBlocks_ASM() line 416 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/impls/asm/asm.c
>       [0]  PCSetUpOnBlocks() line 861 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/interface/precon.c
>       [0]  KSPSetUpOnBlocks() line 151 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
>       [0]  KSPSolve() line 351 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
>       [0]  PCGMGMCycle_Private() line 20 in gmg.cpp
>       [0]  PCApply_GMG() line 329 in gmg.cpp
>       [0]  PCApply() line 373 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/pc/interface/precon.c
>       [0]  KSPFGMRESCycle() line 114 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
>       [0]  KSPSolve_FGMRES() line 277 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
>       [0]  KSPSolve() line 351 in
> /home/fdkong/math/petsc-3.3-p7/src/ksp/ksp/interface/itfunc.c
>
> so it is doing the ILU on each block (process) that is taking all the
> space. Since the ILU is taking much more than the matrix I'm guessing you
> are running ILU(k > 0) which will require a lot of memory.  You can try
> ILU(0), the default, and it should require much less memory. Or you can use
> something like -sub_pc_type sor so that it does not need to allocate any
> factored matrices at all.
>
>    Barry
>
>
> On May 28, 2013, at 7:42 AM, Fande Kong <fande.kong at colorado.edu> wrote:
>
> > Hi Matthew,
> >
> > Thanks,
> >
> > I added the function PetscMallocDump() into the code after calling
> KSPSolve():
> >
> >   (6) after calling KSPSolve()
> >
> >   ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Current process memory %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Max process memory %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMallocDump(PETSC_NULL);CHKERRQ(ierr);
> >
> >  Current space PetscMalloc()ed 290.952 M
> >  Max space PetscMalloced() 593.367 M
> >  Current process memory 306.852 M
> >  Max process memory 301.441 M
> >
> > The printed detailed petscmalloc information is attached. The output
> seems too many lines to understand.  How to understand this information?
> >
> >
> >
> > On Tue, May 28, 2013 at 6:05 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Tue, May 28, 2013 at 5:54 AM, Fande Kong <Fande.Kong at colorado.edu>
> wrote:
> > Hi Smith,
> >
> > Thank you very much. According to your suggestions and information, I
> added these functions into my code to measure the memory usage. Now I am
> confused, since the small problem needs large memory.
> >
> > I added the function PetscMemorySetGetMaximumUsage()  immediately after
> PetscInitialize(). And then I added the following code into several
> positions in the code (before & after setting up unstructured mesh, before
> & after KSPSetUp(), before & after KSPSolve(), and Destroy all stuffs):
> >
> >    PetscLogDouble space =0;
> >   ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Current process memory %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >   ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
> >   ierr =  PetscPrintf(comm,"Max process memory %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
> >
> >
> > In order to measure the memory usage, I just used only one core (mpirun
> -n 1 ./program ) to solve a small problem with 12691 mesh nodes (the
> freedom is about 12691*3= 4 *10^4 ). I solve the linear elasticity problem
> by using FGMRES preconditioned by multigrid method (PCMG). I use all petsc
> standard routines except that I construct coarse matrix and interpolation
> matrix by myself. I used the following run script to set up solver and
> preconditioner:
> >
> > mpirun -n 1 ./linearElasticity  -ksp_type fgmres -pc_type mg
>  -pc_mg_levels 2 -pc_mg_cycle_type v -pc_mg_type multiplicative
> -mg_levels_1_ksp_type richardson -mg_levels_1_ksp_max_it 1
> -mg_levels_1_pc_type asm -mg_levels_1_sub_ksp_type preonly
> -mg_levels_1_sub_pc_type ilu -mg_levels_1_sub_pc_factor_levels 4
> -mg_levels_1_sub_pc_factor_mat_ordering_type rcm -mg_coarse_ksp_type cg
> -mg_coarse_ksp_rtol 0.1  -mg_coarse_ksp_max_it 10 -mg_coarse_pc_type asm
> -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type ilu
> -mg_coarse_sub_pc_factor_levels 2
> -mg_coarse_sub_pc_factor_mat_ordering_type rcm -ksp_view    -log_summary
>  -pc_mg_log
> >
> >
> >  I got the following results:
> >
> > (1) before setting up mesh,
> >
> > Current space PetscMalloc()ed 0.075882 M
> > Max space PetscMalloced() 0.119675 M
> > Current process memory 7.83203 M
> > Max process memory 0 M
> >
> > (2) after setting up mesh,
> >
> > Current space PetscMalloc()ed 16.8411 M
> > Max space PetscMalloced() 22.1353 M
> > Current process memory 28.4336 M
> > Max process memory 33.0547 M
> >
> > (3) before calling KSPSetUp()
> >
> > Current space PetscMalloc()ed 16.868 M
> > Max space PetscMalloced() 22.1353 M
> > Current process memory 28.6914 M
> > Max process memory 33.0547 M
> >
> >
> > (4) after calling KSPSetUp()
> >
> > Current space PetscMalloc()ed 74.3354 M
> > Max space PetscMalloced() 74.3355 M
> >
> > This makes sense. It is 20M for your mesh, 20M
> > for the Krylov space on the fine level, and I am guessing
> > 35M for the Jacobian and the ILU factors.
> >
> > Current process memory 85.6953 M
> > Max process memory 84.9258 M
> >
> > (5) before calling KSPSolve()
> >
> > Current space PetscMalloc()ed 74.3354 M
> > Max space PetscMalloced() 74.3355 M
> > Current process memory 85.8711 M
> > Max process memory 84.9258 M
> >
> > (6) after calling KSPSolve()
> >
> > The question is what was malloc'd here. There is no way we could
> > tell without seeing the code and probably running it. I suggest
> > using
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscMallocDump.html
> > to see what was allocated. The solvers tend not to allocated during
> > the solve, as that is slow. So I would be inclined to check user code
> first.
> >
> >    Matt
> >
> > Current space PetscMalloc()ed 290.952 M
> > Max space PetscMalloced() 593.367 M
> > Current process memory 306.852 M
> > Max process memory 301.441 M
> >
> > (7) After destroying all stuffs
> >
> > Current space PetscMalloc()ed 0.331482 M
> > Max space PetscMalloced() 593.367 M
> > Current process memory 67.2539 M
> > Max process memory 309.137 M
> >
> >
> > So my question is why/if I need so much memory (306.852 M) for so small
> problem (freedom: 4*10^4). Or is it normal case? Or my run script used to
> set up solver is not reasonable?
> >
> >
> > Regards,
> >
> > Fande Kong,
> >
> > Department of Computer Science
> > University of Colorado Boulder
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, May 27, 2013 at 9:48 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    There are several ways to monitor the memory usage. You can divide
> them into two categories: those that monitor how much memory has been
> malloced specifically by PETSc and how much is used totally be the process.
> >
> > PetscMallocGetCurrentUsage() and PetscMallocGetMaximumUsage() which only
> work with the command line option -malloc provide how much PETSc has
> malloced.
> >
> > PetscMemoryGetCurrentUsage() and PetscMemoryGetMaximumUsage() (call
> PetscMemorySetGetMaximumUsage() immediately after PetscInitialize() for
> this one to work) provide total memory usage.
> >
> > These are called on each process so use a MPI_Reduce() to gather the
> total memory across all processes to process 0 to print it out. Suggest
> calling it after the mesh as been set up, then call again immediately
> before the XXXSolve() is called and then after the XXXSolve() is called.
> >
> >    Please let us know if you have any difficulties.
> >
> >     As always we recommend you upgrade to PETSc 3.4
> >
> >     Barry
> >
> >
> >
> > On May 27, 2013, at 10:22 PM, Fande Kong <fande.kong at colorado.edu>
> wrote:
> >
> > > Hi all,
> > >
> > > How to measure the memory usage of the application built on the Petsc?
>  I am now solving linear elasticity equations with fgmres preconditioned by
> two-level method, that is, preconditioned by multigrid method where on each
> level the additive Schwarz method is adopted.  More than 1000 cores are
> adopted to solve this problem on the supercomputer. When the total freedom
> of the problem is about 60M, the application correctly run and produce
> correct results. But when the total freedom increases to 600M, the
> application abort and say there is not enough memory (  the system
> administrator of the supercomputer told me that my application run out
> memory).
> > >
> > > Thus, I want to monitor the memory usage dynamically when the
> application running. Are there any functions or strategies that could be
> used for this purpose?
> > >
> > > The error information is attached.
> > >
> > > Regards,
> > > --
> > > Fande Kong
> > > Department of Computer Science
> > > University of Colorado at Boulder
> > > <solid3dcube2.o1603352><configure and make log.zip>
> >
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> >
> >
> > --
> > Fande Kong
> > Department of Computer Science
> > University of Colorado at Boulder
> > <PetscMallocDum_printed.zip>
>
>
>


-- 
Fande Kong
Department of Computer Science
University of Colorado at Boulder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130529/52740b86/attachment-0001.html>


More information about the petsc-users mailing list