[petsc-users] DMPlex memory problem in scaling test

Danyang Su danyang.su at gmail.com
Thu Oct 10 18:53:10 CDT 2019


On 2019-10-10 4:28 p.m., Matthew Knepley wrote:
> On Thu, Oct 10, 2019 at 4:26 PM Danyang Su <danyang.su at gmail.com 
> <mailto:danyang.su at gmail.com>> wrote:
>
>     Hi All,
>
>     Your guess is right. The memory problem occurs after
>     DMPlexCreateFromCellList and DMPlexDistribute. The mesh related
>     memory in the master processor is not released after that.
>
>     The pseudo code I use is
>
>     if (rank == 0) then         !only the master processor read the
>     mesh file and create cell list
>
>             call
>     DMPlexCreateFromCellList(Petsc_Comm_World,ndim,num_cells, &
>     num_nodes,num_nodes_per_cell,    &
>     Petsc_False,dmplex_cells,ndim,   &  !use Petsc_True to create
>     intermediate mesh entities (faces, edges),
>     dmplex_verts,dmda_flow%da,ierr)     !not work for prism for the
>     current 3.8 version.
>             CHKERRQ(ierr)
>
>     else                                 !slave processors pass zero cells
>
>             call DMPlexCreateFromCellList(Petsc_Comm_World,ndim,0,0, &
>     num_nodes_per_cell,              &
>     Petsc_False,dmplex_cells,ndim,   &  !use Petsc_True to create
>     intermediate mesh entities (faces, edges),
>     dmplex_verts,dmda_flow%da,ierr)     !not work for prism for the
>     current 3.8 version.
>             CHKERRQ(ierr)
>
>     end if
>
>     call DMPlexDistribute
>
>     call DMDestroy(dmda_flow%da,ierr)
>     CHKERRQ(ierr)
>
>     !c set the global mesh as distributed mesh
>     dmda_flow%da = distributedMesh
>
>
>     After calling the above functions, the memory usage for the test
>     case (no. points 953,433, nprocs 160) is shown below:
>     rank 0 PETSc memory current MB    1610.39 PETSc memory maximum
>     MB    1690.42
>     rank 151 PETSc memory current MB     105.00 PETSc memory maximum
>     MB     104.94
>     rank 98 PETSc memory current MB     106.02 PETSc memory maximum
>     MB     105.95
>     rank 18 PETSc memory current MB     106.17 PETSc memory maximum
>     MB     106.17
>
>     Is there any function available in the master version that can
>     release this memory?
>
> DMDestroy() releases this memory, UNLESS you are holding other objects 
> that refer to it, like a vector from that DM.

Well, I have some labels set before distribution. After distribution, 
the labels values are collected but not destroyed. I will try this to 
see if it makes big difference.

Thanks,

danyang

>
>   Thanks,
>
>      Matt
>
>     Thanks,
>
>     Danyang
>
>     On 2019-10-10 11:09 a.m., Mark Adams via petsc-users wrote:
>>     Now that I think about it, the partitioning and distribution can
>>     be done with existing API, I would assume, like is done with
>>     matrices.
>>
>>     I'm still wondering what the H5 format is. I assume that it is
>>     not built for a hardwired number of processes to read in parallel
>>     and that the parallel read is somewhat scalable.
>>
>>     On Thu, Oct 10, 2019 at 12:13 PM Mark Adams <mfadams at lbl.gov
>>     <mailto:mfadams at lbl.gov>> wrote:
>>
>>         A related question, what is the state of having something
>>         like a distributed  DMPlexCreateFromCellList method, but
>>         maybe your H5 efforts would work. My bone modeling code is
>>         old and a pain, but the apps specialized serial mesh
>>         generator could write an H5 file instead of the current FEAP
>>         file. Then you reader, SNES and a large deformation
>>         plasticity element in PetscFE could replace my code, in the
>>         future.
>>
>>         How does your H5 thing work? Is it basically a flat file (not
>>         partitioned) that is read in in parallel by slicing the cell
>>         lists, etc, using file seek or something equivalent, then
>>         reconstructing a local graph on each processor to give to say
>>         Parmetis, then completes the distribution with this
>>         reasonable partitioning? (this is what our current code does)
>>
>>         Thanks,
>>         Mark
>>
>>         On Thu, Oct 10, 2019 at 9:30 AM Dave May via petsc-users
>>         <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>
>>
>>
>>             On Thu 10. Oct 2019 at 15:15, Matthew Knepley
>>             <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>
>>                 On Thu, Oct 10, 2019 at 9:10 AM Dave May
>>                 <dave.mayhem23 at gmail.com
>>                 <mailto:dave.mayhem23 at gmail.com>> wrote:
>>
>>                     On Thu 10. Oct 2019 at 15:04, Matthew Knepley
>>                     <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>
>>                         On Thu, Oct 10, 2019 at 8:41 AM Dave May
>>                         <dave.mayhem23 at gmail.com
>>                         <mailto:dave.mayhem23 at gmail.com>> wrote:
>>
>>                             On Thu 10. Oct 2019 at 14:34, Matthew
>>                             Knepley <knepley at gmail.com
>>                             <mailto:knepley at gmail.com>> wrote:
>>
>>                                 On Thu, Oct 10, 2019 at 8:31 AM Dave
>>                                 May <dave.mayhem23 at gmail.com
>>                                 <mailto:dave.mayhem23 at gmail.com>> wrote:
>>
>>                                     On Thu, 10 Oct 2019 at 13:21,
>>                                     Matthew Knepley via petsc-users
>>                                     <petsc-users at mcs.anl.gov
>>                                     <mailto:petsc-users at mcs.anl.gov>>
>>                                     wrote:
>>
>>                                         On Wed, Oct 9, 2019 at 5:10
>>                                         PM Danyang Su via petsc-users
>>                                         <petsc-users at mcs.anl.gov
>>                                         <mailto:petsc-users at mcs.anl.gov>>
>>                                         wrote:
>>
>>                                             Dear All,
>>
>>                                             I have a question
>>                                             regarding the maximum
>>                                             memory usage for the
>>                                             scaling test. My code is
>>                                             written in Fortran with
>>                                             support for both
>>                                             structured grid (DM) and
>>                                             unstructured grid
>>                                             (DMPlex). It looks like
>>                                             memory consumption is
>>                                             much larger when DMPlex
>>                                             is used and finally
>>                                             causew out_of_memory
>>                                             problem.
>>
>>                                             Below are some test using
>>                                             both structured grid and
>>                                             unstructured grid. The
>>                                             memory consumption by the
>>                                             code is estimated based
>>                                             on all allocated arrays
>>                                             and PETSc memory
>>                                             consumption is estimated
>>                                             based on
>>                                             PetscMemoryGetMaximumUsage.
>>
>>                                             I just wonder why the
>>                                             PETSc memory consumption
>>                                             does not decrease when
>>                                             number of processors
>>                                             increases. For structured
>>                                             grid (scenario 7-9), the
>>                                             memory consumption
>>                                             decreases as number of
>>                                             processors increases.
>>                                             However, for unstructured
>>                                             grid case (scenario
>>                                             14-16), the memory for
>>                                             PETSc part remains
>>                                             unchanged. When I run a
>>                                             larger case, the code
>>                                             crashes because memory is
>>                                             ran out. The same case
>>                                             works on another cluster
>>                                             with 480GB memory per
>>                                             node. Does this make sense?
>>
>>                                         We would need a finer
>>                                         breakdown of where memory is
>>                                         being used. I did this for a
>>                                         paper:
>>
>>                                         https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/jgrb.50217
>>
>>                                         If the subdomains, the halo
>>                                         sizes can overwhelm the basic
>>                                         storage. It looks like the
>>                                         subdomains are big here,
>>                                         but things are not totally
>>                                         clear to me. It would be
>>                                         helpful to send the output of
>>                                         -log_view for each case since
>>                                         PETSc tries to keep track of
>>                                         allocated memory.
>>
>>
>>                                     Matt - I'd guess that there is a
>>                                     sequential (non-partitioned) mesh
>>                                     hanging around in memory.
>>                                     Is it possible that he's created
>>                                     the PLEX object which is loaded
>>                                     sequentially (stored and retained
>>                                     in memory and never released),
>>                                     and then afterwards distributed?
>>                                     This can never happen with the
>>                                     DMDA and the table verifies this.
>>                                     If his code using the DMDA and
>>                                     DMPLEX are as identical as
>>                                     possible (albeit the DM used),
>>                                     then a sequential mesh held in
>>                                     memory seems the likely cause.
>>
>>
>>                                 Dang it, Dave is always right.
>>
>>                                 How to prevent this?
>>
>>
>>                             I thought you/Lawrence/Vaclav/others...
>>                             had developed and provided support  for a
>>                             parallel DMPLEX load via a suitably
>>                             defined plex specific H5 mesh file.
>>
>>
>>                         We have, but these tests looked like
>>                         generated meshes.
>>
>>
>>                     Great.
>>
>>                     So would a solution to the problem be to have the
>>                     user modify their code in the follow way:
>>                     * they move the mesh gen stage into a seperate
>>                     exec which they call offline (on a fat node with
>>                     lots of memory), and dump the appropriate file
>>                     * they change their existing application to
>>                     simply load that file in parallel.
>>
>>
>>                 Yes.
>>
>>                     If there were examples illustrating how to create
>>                     the file which can be loaded in parallel I think
>>                     it would be very helpful for the user (and many
>>                     others)
>>
>>
>>                 I think Vaclav is going to add his examples as soon
>>                 as we fix this parallel interpolation bug. I am
>>                 praying for time in the latter
>>                 part of October to do this.
>>
>>
>>
>>             Excellent news - thanks for the update and info.
>>
>>             Cheers
>>             Dave
>>
>>
>>
>>                   Thanks,
>>
>>                     Matt
>>
>>                     Cheers
>>                     Dave
>>
>>
>>                           Thanks,
>>
>>                             Matt
>>
>>                                 Since it looks like you are okay with
>>                                 fairly regular meshes, I would
>>                                 construct the
>>                                 coarsest mesh you can, and then use
>>
>>                                 -dm_refine <k>
>>
>>                                 which is activated by
>>                                 DMSetFromOptions(). Make sure to call
>>                                 it after DMPlexDistribute(). It will
>>                                 regularly
>>                                 refine in parallel and should show
>>                                 good memory scaling as Dave says.
>>
>>                                   Thanks,
>>
>>                                  Matt
>>
>>
>>                                           Thanks,
>>
>>                                              Matt
>>
>>                                             scenario 	no. points
>>                                             cell type 	DMPLex 	nprocs
>>                                             	no. nodes 	mem per node
>>                                             GB 	solver 	Rank 0 memory
>>                                             MB 	Rank 0 petsc memory
>>                                             MB 	Runtime (sec)
>>                                             1 	2121 	rectangle 	no
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	0.21 	41.6 	
>>                                             2 	8241 	rectangle 	no
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	0.59 	51.84 	
>>                                             3 	32481 	rectangle 	no
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	1.95 	59.1 	
>>                                             4 	128961 	rectangle 	no
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	7.05 	89.71 	
>>                                             5 	513921 	rectangle 	no
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	26.76
>>                                             110.58 	
>>                                             6 	2051841 	rectangle 	no
>>                                             	40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	104.21
>>                                             232.05 	
>>                                             *7* 	*8199681*
>>                                             *rectangle* 	*no* 	*40*
>>                                             *1* 	*200* 	*GMRES,Hypre
>>                                             preconditioner* 	*411.26*
>>                                             	*703.27* 	*140.29*
>>                                             *8* 	*8199681*
>>                                             *rectangle* 	*no* 	*80*
>>                                             *2* 	*200* 	*GMRES,Hypre
>>                                             preconditioner* 	*206.6*
>>                                             *387.25* 	*62.04*
>>                                             *9* 	*8199681*
>>                                             *rectangle* 	*no* 	*160*
>>                                             *4* 	*200* 	*GMRES,Hypre
>>                                             preconditioner* 	*104.28*
>>                                             	*245.3* 	*32.76*
>>
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             10 	2121 	triangle 	yes
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	0.49 	61.78 	
>>                                             11 	15090 	triangle 	yes
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	2.32 	96.61 	
>>                                             12 	59847 	triangle 	yes
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	8.28
>>                                             176.14 	
>>                                             13 	238568 	triangle 	yes
>>                                             	40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	31.89
>>                                             573.73 	
>>                                             *14* 	*953433*
>>                                             *triangle* 	*yes* 	*40*
>>                                             *1* 	*200* 	*GMRES,Hypre
>>                                             preconditioner* 	*119.23*
>>                                             	*2102.54* 	*44.11*
>>                                             *15* 	*953433*
>>                                             *triangle* 	*yes* 	*80*
>>                                             *2* 	*200* 	*GMRES,Hypre
>>                                             preconditioner* 	*72.99*
>>                                             *2123.8* 	*24.36*
>>                                             *16* 	*953433*
>>                                             *triangle* 	*yes* 	*160*
>>                                             *4* 	*200* 	*GMRES,Hypre
>>                                             preconditioner* 	*48.65*
>>                                             *2076.25* 	*14.87*
>>
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             17 	55770 	prism 	yes 	40
>>                                             	1 	200 	GMRES,Hypre
>>                                             preconditioner 	18.46
>>                                             219.39 	
>>                                             18 	749814 	prism 	yes
>>                                             40 	1 	200 	GMRES,Hypre
>>                                             preconditioner 	149.86
>>                                             2412.39 	
>>                                             19 	7000050 	prism 	yes
>>                                             40 to 640 	1 to 16 	200
>>                                             GMRES,Hypre preconditioner 	
>>                                             	out_of_memory 	
>>
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             	
>>                                             *20* 	*7000050* 	*prism*
>>                                             *yes* 	*64* 	*2* 	*480*
>>                                             *GMRES,Hypre
>>                                             preconditioner* 	*890.92*
>>                                             	*17214.41* 	
>>
>>                                             The error information of
>>                                             scenario 19 is shown below:
>>
>>                                             kernel messages produced
>>                                             during job executions:
>>                                             [Oct 9 10:41]
>>                                             mpiexec.hydra invoked
>>                                             oom-killer:
>>                                             gfp_mask=0x200da,
>>                                             order=0, oom_score_adj=0
>>                                             [  +0.010274]
>>                                             mpiexec.hydra cpuset=/
>>                                             mems_allowed=0-1
>>                                             [  +0.006680] CPU: 2 PID:
>>                                             144904 Comm:
>>                                             mpiexec.hydra Tainted: G
>>                                             OE ------------
>>                                             3.10.0-862.14.4.el7.x86_64 #1
>>                                             [  +0.013365] Hardware
>>                                             name: Lenovo ThinkSystem
>>                                             SD530
>>                                             -[7X21CTO1WW]-/-[7X21CTO1WW]-,
>>                                             BIOS -[TEE124N-1.40]-
>>                                             06/12/2018
>>                                             [  +0.012866] Call Trace:
>>                                             [  +0.003945]
>>                                             [<ffffffffb3313754>]
>>                                             dump_stack+0x19/0x1b
>>                                             [  +0.006995]
>>                                             [<ffffffffb330e91f>]
>>                                             dump_header+0x90/0x229
>>                                             [  +0.007121]
>>                                             [<ffffffffb2cfa982>] ?
>>                                             ktime_get_ts64+0x52/0xf0
>>                                             [  +0.007451]
>>                                             [<ffffffffb2d5141f>] ?
>>                                             delayacct_end+0x8f/0xb0
>>                                             [  +0.007393]
>>                                             [<ffffffffb2d9ac94>]
>>                                             oom_kill_process+0x254/0x3d0
>>                                             [  +0.007592]
>>                                             [<ffffffffb2d9a73d>] ?
>>                                             oom_unkillable_task+0xcd/0x120
>>                                             [  +0.007978]
>>                                             [<ffffffffb2d9a7e6>] ?
>>                                             find_lock_task_mm+0x56/0xc0
>>                                             [  +0.007729]
>>                                             [<ffffffffb2d9b4d6>]
>>                                             *out_of_memory+0x4b6/0x4f0*
>>                                             [  +0.007358]
>>                                             [<ffffffffb330f423>]
>>                                             __alloc_pages_slowpath+0x5d6/0x724
>>                                             [  +0.008190]
>>                                             [<ffffffffb2da18b5>]
>>                                             __alloc_pages_nodemask+0x405/0x420
>>
>>                                             Thanks,
>>
>>                                             Danyang
>>
>>
>>
>>                                         -- 
>>                                         What most experimenters take
>>                                         for granted before they begin
>>                                         their experiments is
>>                                         infinitely more interesting
>>                                         than any results to which
>>                                         their experiments lead.
>>                                         -- Norbert Wiener
>>
>>                                         https://www.cse.buffalo.edu/~knepley/
>>                                         <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>>                                 -- 
>>                                 What most experimenters take for
>>                                 granted before they begin their
>>                                 experiments is infinitely more
>>                                 interesting than any results to which
>>                                 their experiments lead.
>>                                 -- Norbert Wiener
>>
>>                                 https://www.cse.buffalo.edu/~knepley/
>>                                 <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>>                         -- 
>>                         What most experimenters take for granted
>>                         before they begin their experiments is
>>                         infinitely more interesting than any results
>>                         to which their experiments lead.
>>                         -- Norbert Wiener
>>
>>                         https://www.cse.buffalo.edu/~knepley/
>>                         <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>>                 -- 
>>                 What most experimenters take for granted before they
>>                 begin their experiments is infinitely more
>>                 interesting than any results to which their
>>                 experiments lead.
>>                 -- Norbert Wiener
>>
>>                 https://www.cse.buffalo.edu/~knepley/
>>                 <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191010/f6f01492/attachment-0001.html>


More information about the petsc-users mailing list