[petsc-users] DMPlex memory problem in scaling test
Danyang Su
danyang.su at gmail.com
Wed Oct 9 19:16:40 CDT 2019
I test a small case with number of processors from 1 to 40 (scenario
17-22 below). The memory consumption by PETSc decreases a little as
number of processors increases, but not as significant as the structured
grid code. Is there any way to further reduce the memory consumption by
DMPlex?
scenario no. points cell type DMPLex nprocs no. nodes mem per node
GB Rank 0 memory MB Rank 0 petsc memory MB Runtime (sec)
1 2121 rectangle no 40 1 200 0.21 41.6
2 8241 rectangle no 40 1 200 0.59 51.84
3 32481 rectangle no 40 1 200 1.95 59.1
4 128961 rectangle no 40 1 200 7.05 89.71
5 513921 rectangle no 40 1 200 26.76 110.58
6 2051841 rectangle no 40 1 200 104.21 232.05
*7* *8199681* *rectangle* *no* *40* *1* *200* *411.26* *703.27*
*140.29*
*8* *8199681* *rectangle* *no* *80* *2* *200* *206.6* *387.25*
*62.04*
*9* *8199681* *rectangle* *no* *160* *4* *200* *104.28* *245.3*
*32.76*
10 2121 triangle yes 40 1 200 0.49 61.78
11 15090 triangle yes 40 1 200 2.32 96.61
12 59847 triangle yes 40 1 200 8.28 176.14
13 238568 triangle yes 40 1 200 31.89 573.73
*14* *953433* *triangle* *yes* *40* *1* *200* *119.23* *2102.54*
*44.11*
*15* *953433* *triangle* *yes* *80* *2* *200* *72.99* *2123.8*
*24.36*
*16* *953433* *triangle* *yes* *160* *4* *200* *48.65* *2076.25*
*14.87*
*
* *
* *
* *
* *
* *
* *
* *
* *
* *
*
*17* *55770* *prism* *yes* *1* *1* *200* *340.98* *545.33* *
*
*18* *55770* *prism* *yes* *4* *1* *200* *107.72* *339.78* *
*
*19* *55770* *prism* *yes* *8* *1* *200* *61.39* *272.03*
*20* *55770* *prism* *yes* *16* *1* *200* *34.64* *236.58*
*21* *55770* *prism* *yes* *32* *1* *200* *23.41* *225.66*
*22* *55770* *prism* *yes* *40* *1* *200* *18.46* *219.39*
23 749814 prism yes 40 1 200 149.86 2412.39
24 7000050 prism yes 40 to 640 1 to 16 200
out_of_memory
*25* *7000050* *prism* *yes* *32* *1* *480* *1655.1* *17380.12*
*2712.11*
*26* *7000050* *prism* *yes* *64* *2* *480* *890.92* *17214.41*
*1497.55*
*27* *7000050* *prism* *yes* *128* *4* *480* *451.23* *17159.34*
*941.52*
Thanks,
Danyang
On 2019-10-09 2:08 p.m., Danyang Su wrote:
>
> Dear All,
>
> I have a question regarding the maximum memory usage for the scaling
> test. My code is written in Fortran with support for both structured
> grid (DM) and unstructured grid (DMPlex). It looks like memory
> consumption is much larger when DMPlex is used and finally causew
> out_of_memory problem.
>
> Below are some test using both structured grid and unstructured grid.
> The memory consumption by the code is estimated based on all allocated
> arrays and PETSc memory consumption is estimated based on
> PetscMemoryGetMaximumUsage.
>
> I just wonder why the PETSc memory consumption does not decrease when
> number of processors increases. For structured grid (scenario 7-9),
> the memory consumption decreases as number of processors increases.
> However, for unstructured grid case (scenario 14-16), the memory for
> PETSc part remains unchanged. When I run a larger case, the code
> crashes because memory is ran out. The same case works on another
> cluster with 480GB memory per node. Does this make sense?
>
> scenario no. points cell type DMPLex nprocs no. nodes mem per
> node GB solver Rank 0 memory MB Rank 0 petsc memory MB Runtime (sec)
> 1 2121 rectangle no 40 1 200 GMRES,Hypre preconditioner 0.21
> 41.6
> 2 8241 rectangle no 40 1 200 GMRES,Hypre preconditioner 0.59
> 51.84
> 3 32481 rectangle no 40 1 200 GMRES,Hypre preconditioner 1.95
> 59.1
> 4 128961 rectangle no 40 1 200 GMRES,Hypre preconditioner 7.05
> 89.71
> 5 513921 rectangle no 40 1 200 GMRES,Hypre preconditioner
> 26.76 110.58
> 6 2051841 rectangle no 40 1 200 GMRES,Hypre preconditioner
> 104.21 232.05
> *7* *8199681* *rectangle* *no* *40* *1* *200* *GMRES,Hypre
> preconditioner* *411.26* *703.27* *140.29*
> *8* *8199681* *rectangle* *no* *80* *2* *200* *GMRES,Hypre
> preconditioner* *206.6* *387.25* *62.04*
> *9* *8199681* *rectangle* *no* *160* *4* *200* *GMRES,Hypre
> preconditioner* *104.28* *245.3* *32.76*
>
>
>
>
>
>
>
>
>
>
>
> 10 2121 triangle yes 40 1 200 GMRES,Hypre preconditioner 0.49
> 61.78
> 11 15090 triangle yes 40 1 200 GMRES,Hypre preconditioner 2.32
> 96.61
> 12 59847 triangle yes 40 1 200 GMRES,Hypre preconditioner 8.28
> 176.14
> 13 238568 triangle yes 40 1 200 GMRES,Hypre preconditioner
> 31.89 573.73
> *14* *953433* *triangle* *yes* *40* *1* *200* *GMRES,Hypre
> preconditioner* *119.23* *2102.54* *44.11*
> *15* *953433* *triangle* *yes* *80* *2* *200* *GMRES,Hypre
> preconditioner* *72.99* *2123.8* *24.36*
> *16* *953433* *triangle* *yes* *160* *4* *200* *GMRES,Hypre
> preconditioner* *48.65* *2076.25* *14.87*
>
>
>
>
>
>
>
>
>
>
>
> 17 55770 prism yes 40 1 200 GMRES,Hypre preconditioner 18.46
> 219.39
> 18 749814 prism yes 40 1 200 GMRES,Hypre preconditioner 149.86
> 2412.39
> 19 7000050 prism yes 40 to 640 1 to 16 200 GMRES,Hypre
> preconditioner
> out_of_memory
>
>
>
>
>
>
>
>
>
>
>
> *20* *7000050* *prism* *yes* *64* *2* *480* *GMRES,Hypre
> preconditioner* *890.92* *17214.41*
>
> The error information of scenario 19 is shown below:
>
> kernel messages produced during job executions:
> [Oct 9 10:41] mpiexec.hydra invoked oom-killer: gfp_mask=0x200da,
> order=0, oom_score_adj=0
> [ +0.010274] mpiexec.hydra cpuset=/ mems_allowed=0-1
> [ +0.006680] CPU: 2 PID: 144904 Comm: mpiexec.hydra Tainted:
> G OE ------------ 3.10.0-862.14.4.el7.x86_64 #1
> [ +0.013365] Hardware name: Lenovo ThinkSystem SD530
> -[7X21CTO1WW]-/-[7X21CTO1WW]-, BIOS -[TEE124N-1.40]- 06/12/2018
> [ +0.012866] Call Trace:
> [ +0.003945] [<ffffffffb3313754>] dump_stack+0x19/0x1b
> [ +0.006995] [<ffffffffb330e91f>] dump_header+0x90/0x229
> [ +0.007121] [<ffffffffb2cfa982>] ? ktime_get_ts64+0x52/0xf0
> [ +0.007451] [<ffffffffb2d5141f>] ? delayacct_end+0x8f/0xb0
> [ +0.007393] [<ffffffffb2d9ac94>] oom_kill_process+0x254/0x3d0
> [ +0.007592] [<ffffffffb2d9a73d>] ? oom_unkillable_task+0xcd/0x120
> [ +0.007978] [<ffffffffb2d9a7e6>] ? find_lock_task_mm+0x56/0xc0
> [ +0.007729] [<ffffffffb2d9b4d6>] *out_of_memory+0x4b6/0x4f0*
> [ +0.007358] [<ffffffffb330f423>] __alloc_pages_slowpath+0x5d6/0x724
> [ +0.008190] [<ffffffffb2da18b5>] __alloc_pages_nodemask+0x405/0x420
>
> Thanks,
>
> Danyang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191009/365bf3c7/attachment-0001.html>
More information about the petsc-users
mailing list