[petsc-users] DMPlex memory problem in scaling test

Thu Oct 10 08:14:56 CDT 2019

On Thu, Oct 10, 2019 at 9:10 AM Dave May <dave.mayhem23 at gmail.com> wrote:

> On Thu 10. Oct 2019 at 15:04, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Thu, Oct 10, 2019 at 8:41 AM Dave May <dave.mayhem23 at gmail.com> wrote:
>>
>>> On Thu 10. Oct 2019 at 14:34, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>>> On Thu, Oct 10, 2019 at 8:31 AM Dave May <dave.mayhem23 at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, 10 Oct 2019 at 13:21, Matthew Knepley via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> On Wed, Oct 9, 2019 at 5:10 PM Danyang Su via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Dear All,
>>>>>>>
>>>>>>> I have a question regarding the maximum memory usage for the scaling
>>>>>>> test. My code is written in Fortran with support for both structured grid
>>>>>>> (DM) and unstructured grid (DMPlex). It looks like memory consumption is
>>>>>>> much larger when DMPlex is used and finally causew out_of_memory problem.
>>>>>>>
>>>>>>> Below are some test using both structured grid and unstructured
>>>>>>> grid. The memory consumption by the code is estimated based on all
>>>>>>> allocated arrays and PETSc memory consumption is estimated based on
>>>>>>> PetscMemoryGetMaximumUsage.
>>>>>>>
>>>>>>> I just wonder why the PETSc memory consumption does not decrease
>>>>>>> when number of processors increases. For structured grid (scenario 7-9),
>>>>>>> the memory consumption decreases as number of processors increases.
>>>>>>> However, for unstructured grid case (scenario 14-16), the memory for PETSc
>>>>>>> part remains unchanged. When I run a larger case, the code crashes because
>>>>>>> memory is ran out. The same case works on another cluster with 480GB memory
>>>>>>> per node. Does this make sense?
>>>>>>>
>>>>>> We would need a finer breakdown of where memory is being used. I did
>>>>>> this for a paper:
>>>>>>
>>>>>>   https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/jgrb.50217
>>>>>>
>>>>>> If the subdomains, the halo sizes can overwhelm the basic storage. It
>>>>>> looks like the subdomains are big here,
>>>>>> but things are not totally clear to me. It would be helpful to send
>>>>>> the output of -log_view for each case since
>>>>>> PETSc tries to keep track of allocated memory.
>>>>>>
>>>>>
>>>>> Matt - I'd guess that there is a sequential (non-partitioned) mesh
>>>>> hanging around in memory.
>>>>> Is it possible that he's created the PLEX object which is loaded
>>>>> sequentially (stored and retained in memory and never released), and then
>>>>> afterwards distributed?
>>>>> This can never happen with the DMDA and the table verifies this.
>>>>> If his code using the DMDA and DMPLEX are as identical as possible
>>>>> (albeit the DM used), then a sequential mesh held in memory seems the
>>>>> likely cause.
>>>>>
>>>>
>>>> Dang it, Dave is always right.
>>>>
>>>> How to prevent this?
>>>>
>>>
>>> I thought you/Lawrence/Vaclav/others... had developed and provided
>>> support  for a parallel DMPLEX load via a suitably defined plex specific H5
>>> mesh file.
>>>
>>
>> We have, but these tests looked like generated meshes.
>>
>
> Great.
>
> So would a solution to the problem be to have the user modify their code
> in the follow way:
> * they move the mesh gen stage into a seperate exec which they call
> offline (on a fat node with lots of memory), and dump the appropriate file
> * they change their existing application to simply load that file in
> parallel.
>

Yes.

> If there were examples illustrating how to create the file which can be
> loaded in parallel I think it would be very helpful for the user (and many
> others)
>

I think Vaclav is going to add his examples as soon as we fix this parallel
interpolation bug. I am praying for time in the latter
part of October to do this.

  Thanks,

    Matt

> Cheers
> Dave
>
>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> Since it looks like you are okay with fairly regular meshes, I would
>>>> construct the
>>>> coarsest mesh you can, and then use
>>>>
>>>>   -dm_refine <k>
>>>>
>>>> which is activated by DMSetFromOptions(). Make sure to call it after
>>>> DMPlexDistribute(). It will regularly
>>>> refine in parallel and should show good memory scaling as Dave says.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>> scenario no. points cell type DMPLex nprocs no. nodes mem per node
>>>>>>> GB solver Rank 0 memory MB Rank 0 petsc memory MB Runtime (sec)
>>>>>>> 1 2121 rectangle no 40 1 200 GMRES,Hypre preconditioner 0.21 41.6
>>>>>>> 2 8241 rectangle no 40 1 200 GMRES,Hypre preconditioner 0.59 51.84
>>>>>>> 3 32481 rectangle no 40 1 200 GMRES,Hypre preconditioner 1.95 59.1
>>>>>>> 4 128961 rectangle no 40 1 200 GMRES,Hypre preconditioner 7.05 89.71
>>>>>>> 5 513921 rectangle no 40 1 200 GMRES,Hypre preconditioner 26.76
>>>>>>> 110.58
>>>>>>> 6 2051841 rectangle no 40 1 200 GMRES,Hypre preconditioner 104.21
>>>>>>> 232.05
>>>>>>> *7* *8199681* *rectangle* *no* *40* *1* *200* *GMRES,Hypre
>>>>>>> preconditioner* *411.26* *703.27* *140.29*
>>>>>>> *8* *8199681* *rectangle* *no* *80* *2* *200* *GMRES,Hypre
>>>>>>> preconditioner* *206.6* *387.25* *62.04*
>>>>>>> *9* *8199681* *rectangle* *no* *160* *4* *200* *GMRES,Hypre
>>>>>>> preconditioner* *104.28* *245.3* *32.76*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 10 2121 triangle yes 40 1 200 GMRES,Hypre preconditioner 0.49 61.78
>>>>>>> 11 15090 triangle yes 40 1 200 GMRES,Hypre preconditioner 2.32 96.61
>>>>>>> 12 59847 triangle yes 40 1 200 GMRES,Hypre preconditioner 8.28
>>>>>>> 176.14
>>>>>>> 13 238568 triangle yes 40 1 200 GMRES,Hypre preconditioner 31.89
>>>>>>> 573.73
>>>>>>> *14* *953433* *triangle* *yes* *40* *1* *200* *GMRES,Hypre
>>>>>>> preconditioner* *119.23* *2102.54* *44.11*
>>>>>>> *15* *953433* *triangle* *yes* *80* *2* *200* *GMRES,Hypre
>>>>>>> preconditioner* *72.99* *2123.8* *24.36*
>>>>>>> *16* *953433* *triangle* *yes* *160* *4* *200* *GMRES,Hypre
>>>>>>> preconditioner* *48.65* *2076.25* *14.87*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 17 55770 prism yes 40 1 200 GMRES,Hypre preconditioner 18.46 219.39
>>>>>>> 18 749814 prism yes 40 1 200 GMRES,Hypre preconditioner 149.86
>>>>>>> 2412.39
>>>>>>> 19 7000050 prism yes 40 to 640 1 to 16 200 GMRES,Hypre
>>>>>>> preconditioner
>>>>>>> out_of_memory
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *20* *7000050* *prism* *yes* *64* *2* *480* *GMRES,Hypre
>>>>>>> preconditioner* *890.92* *17214.41*
>>>>>>>
>>>>>>> The error information of scenario 19 is shown below:
>>>>>>>
>>>>>>> kernel messages produced during job executions:
>>>>>>> [Oct 9 10:41] mpiexec.hydra invoked oom-killer: gfp_mask=0x200da,
>>>>>>> order=0, oom_score_adj=0
>>>>>>> [  +0.010274] mpiexec.hydra cpuset=/ mems_allowed=0-1
>>>>>>> [  +0.006680] CPU: 2 PID: 144904 Comm: mpiexec.hydra Tainted:
>>>>>>> G           OE  ------------   3.10.0-862.14.4.el7.x86_64 #1
>>>>>>> [  +0.013365] Hardware name: Lenovo ThinkSystem SD530
>>>>>>> -[7X21CTO1WW]-/-[7X21CTO1WW]-, BIOS -[TEE124N-1.40]- 06/12/2018
>>>>>>> [  +0.012866] Call Trace:
>>>>>>> [  +0.003945]  [<ffffffffb3313754>] dump_stack+0x19/0x1b
>>>>>>> [  +0.006995]  [<ffffffffb330e91f>] dump_header+0x90/0x229
>>>>>>> [  +0.007121]  [<ffffffffb2cfa982>] ? ktime_get_ts64+0x52/0xf0
>>>>>>> [  +0.007451]  [<ffffffffb2d5141f>] ? delayacct_end+0x8f/0xb0
>>>>>>> [  +0.007393]  [<ffffffffb2d9ac94>] oom_kill_process+0x254/0x3d0
>>>>>>> [  +0.007592]  [<ffffffffb2d9a73d>] ? oom_unkillable_task+0xcd/0x120
>>>>>>> [  +0.007978]  [<ffffffffb2d9a7e6>] ? find_lock_task_mm+0x56/0xc0
>>>>>>> [  +0.007729]  [<ffffffffb2d9b4d6>] *out_of_memory+0x4b6/0x4f0*
>>>>>>> [  +0.007358]  [<ffffffffb330f423>]
>>>>>>> __alloc_pages_slowpath+0x5d6/0x724
>>>>>>> [  +0.008190]  [<ffffffffb2da18b5>]
>>>>>>> __alloc_pages_nodemask+0x405/0x420
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Danyang
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191010/123f3e93/attachment.html>