[petsc-users] How to measure the memory usage of the application built on the Petsc?

Tue May 28 11:01:04 CDT 2013

On Tue, May 28, 2013 at 8:42 AM, Fande Kong <fande.kong at colorado.edu> wrote:

> Hi Matthew,
>
> Thanks,
>
> I added the function PetscMallocDump() into the code after calling
> KSPSolve():
>
>   (6) after calling KSPSolve()
>
>   ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Current process memory %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
>   ierr =  PetscPrintf(comm,"Max process memory %G M\n",
> space/(1024*1024));CHKERRQ(ierr);
>   ierr =  PetscMallocDump(PETSC_NULL);CHKERRQ(ierr);
>
>  Current space PetscMalloc()ed 290.952 M
>  Max space PetscMalloced() 593.367 M
>  Current process memory 306.852 M
>  Max process memory 301.441 M
>
> The printed detailed petscmalloc information is attached. The output seems
> too many lines to understand.  How to understand this information?
>

1) Why would you ever start with a complex, parallel example for debugging?
This is crazy, and not how anyone would
    ever start a scientific investigation. You simplify the problem until
you understand everything, and then slowly add
    complexity.

2) The idea is to dump once before and once after the solve, and diff

    Matt

> On Tue, May 28, 2013 at 6:05 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Tue, May 28, 2013 at 5:54 AM, Fande Kong <Fande.Kong at colorado.edu>wrote:
>>
>>> Hi Smith,
>>>
>>> Thank you very much. According to your suggestions and information, I
>>> added these functions into my code to measure the memory usage. Now I am
>>> confused, since the small problem needs large memory.
>>>
>>> I added the function PetscMemorySetGetMaximumUsage()  immediately after
>>> PetscInitialize(). And then I added the following code into several
>>> positions in the code (before & after setting up unstructured mesh, before
>>> & after KSPSetUp(), before & after KSPSolve(), and Destroy all stuffs):
>>>
>>>    PetscLogDouble space =0;
>>>   ierr =  PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr);
>>>   ierr =  PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n",
>>> space/(1024*1024));CHKERRQ(ierr);
>>>   ierr =  PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr);
>>>   ierr =  PetscPrintf(comm,"Max space PetscMalloced() %G M\n",
>>> space/(1024*1024));CHKERRQ(ierr);
>>>   ierr =  PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr);
>>>   ierr =  PetscPrintf(comm,"Current process memory %G M\n",
>>> space/(1024*1024));CHKERRQ(ierr);
>>>   ierr =  PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr);
>>>   ierr =  PetscPrintf(comm,"Max process memory %G M\n",
>>> space/(1024*1024));CHKERRQ(ierr);
>>>
>>>
>>> In order to measure the memory usage, I just used only one core (mpirun
>>> -n 1 ./program ) to solve a small problem with 12691 mesh nodes (the
>>> freedom is about 12691*3= 4 *10^4 ). I solve the linear elasticity problem
>>> by using FGMRES preconditioned by multigrid method (PCMG). I use all petsc
>>> standard routines except that I construct coarse matrix and interpolation
>>> matrix by myself. I used the following run script to set up solver and
>>> preconditioner:
>>>
>>> mpirun -n 1 ./linearElasticity  -ksp_type fgmres -pc_type mg
>>> -pc_mg_levels 2 -pc_mg_cycle_type v -pc_mg_type multiplicative
>>> -mg_levels_1_ksp_type richardson -mg_levels_1_ksp_max_it 1
>>> -mg_levels_1_pc_type asm -mg_levels_1_sub_ksp_type preonly
>>> -mg_levels_1_sub_pc_type ilu -mg_levels_1_sub_pc_factor_levels 4
>>> -mg_levels_1_sub_pc_factor_mat_ordering_type rcm -mg_coarse_ksp_type cg
>>> -mg_coarse_ksp_rtol 0.1  -mg_coarse_ksp_max_it 10 -mg_coarse_pc_type asm
>>> -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type ilu
>>> -mg_coarse_sub_pc_factor_levels 2
>>> -mg_coarse_sub_pc_factor_mat_ordering_type rcm -ksp_view    -log_summary
>>> -pc_mg_log
>>>
>>>
>>>  I got the following results:
>>>
>>> (1) before setting up mesh,
>>>
>>> Current space PetscMalloc()ed 0.075882 M
>>> Max space PetscMalloced() 0.119675 M
>>> Current process memory 7.83203 M
>>> Max process memory 0 M
>>>
>>> (2) after setting up mesh,
>>>
>>> Current space PetscMalloc()ed 16.8411 M
>>> Max space PetscMalloced() 22.1353 M
>>> Current process memory 28.4336 M
>>> Max process memory 33.0547 M
>>>
>>> (3) before calling KSPSetUp()
>>>
>>> Current space PetscMalloc()ed 16.868 M
>>> Max space PetscMalloced() 22.1353 M
>>> Current process memory 28.6914 M
>>> Max process memory 33.0547 M
>>>
>>>
>>> (4) after calling KSPSetUp()
>>>
>>> Current space PetscMalloc()ed 74.3354 M
>>> Max space PetscMalloced() 74.3355 M
>>>
>>
>> This makes sense. It is 20M for your mesh, 20M
>> for the Krylov space on the fine level, and I am guessing
>> 35M for the Jacobian and the ILU factors.
>>
>>
>>> Current process memory 85.6953 M
>>> Max process memory 84.9258 M
>>>
>>> (5) before calling KSPSolve()
>>>
>>> Current space PetscMalloc()ed 74.3354 M
>>> Max space PetscMalloced() 74.3355 M
>>> Current process memory 85.8711 M
>>> Max process memory 84.9258 M
>>>
>>> (6) after calling KSPSolve()
>>>
>>
>> The question is what was malloc'd here. There is no way we could
>> tell without seeing the code and probably running it. I suggest
>> using
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscMallocDump.html
>> to see what was allocated. The solvers tend not to allocated during
>> the solve, as that is slow. So I would be inclined to check user code
>> first.
>>
>>    Matt
>>
>>
>>> Current space PetscMalloc()ed 290.952 M
>>> Max space PetscMalloced() 593.367 M
>>> Current process memory 306.852 M
>>> Max process memory 301.441 M
>>>
>>> (7) After destroying all stuffs
>>>
>>> Current space PetscMalloc()ed 0.331482 M
>>> Max space PetscMalloced() 593.367 M
>>> Current process memory 67.2539 M
>>> Max process memory 309.137 M
>>>
>>>
>>> So my question is why/if I need so much memory (306.852 M) for so small
>>> problem (freedom: 4*10^4). Or is it normal case? Or my run script used to
>>> set up solver is not reasonable?
>>>
>>>
>>> Regards,
>>>
>>> Fande Kong,
>>>
>>> Department of Computer Science
>>> University of Colorado Boulder
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, May 27, 2013 at 9:48 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>>
>>>>    There are several ways to monitor the memory usage. You can divide
>>>> them into two categories: those that monitor how much memory has been
>>>> malloced specifically by PETSc and how much is used totally be the process.
>>>>
>>>> PetscMallocGetCurrentUsage() and PetscMallocGetMaximumUsage() which
>>>> only work with the command line option -malloc provide how much PETSc has
>>>> malloced.
>>>>
>>>> PetscMemoryGetCurrentUsage() and PetscMemoryGetMaximumUsage() (call
>>>> PetscMemorySetGetMaximumUsage() immediately after PetscInitialize() for
>>>> this one to work) provide total memory usage.
>>>>
>>>> These are called on each process so use a MPI_Reduce() to gather the
>>>> total memory across all processes to process 0 to print it out. Suggest
>>>> calling it after the mesh as been set up, then call again immediately
>>>> before the XXXSolve() is called and then after the XXXSolve() is called.
>>>>
>>>>    Please let us know if you have any difficulties.
>>>>
>>>>     As always we recommend you upgrade to PETSc 3.4
>>>>
>>>>     Barry
>>>>
>>>>
>>>>
>>>> On May 27, 2013, at 10:22 PM, Fande Kong <fande.kong at colorado.edu>
>>>> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > How to measure the memory usage of the application built on the
>>>> Petsc?  I am now solving linear elasticity equations with fgmres
>>>> preconditioned by two-level method, that is, preconditioned by multigrid
>>>> method where on each level the additive Schwarz method is adopted.  More
>>>> than 1000 cores are adopted to solve this problem on the supercomputer.
>>>> When the total freedom of the problem is about 60M, the application
>>>> correctly run and produce correct results. But when the total freedom
>>>> increases to 600M, the application abort and say there is not enough memory
>>>> (  the system administrator of the supercomputer told me that my
>>>> application run out memory).
>>>> >
>>>> > Thus, I want to monitor the memory usage dynamically when the
>>>> application running. Are there any functions or strategies that could be
>>>> used for this purpose?
>>>> >
>>>> > The error information is attached.
>>>> >
>>>> > Regards,
>>>> > --
>>>> > Fande Kong
>>>> > Department of Computer Science
>>>> > University of Colorado at Boulder
>>>> > <solid3dcube2.o1603352><configure and make log.zip>
>>>>
>>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> Fande Kong
> Department of Computer Science
> University of Colorado at Boulder
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130528/e9beb714/attachment.html>