[petsc-users] DMPlex memory problem in scaling test

Danyang Su danyang.su at gmail.com
Wed Oct 9 16:08:54 CDT 2019


Dear All,

I have a question regarding the maximum memory usage for the scaling 
test. My code is written in Fortran with support for both structured 
grid (DM) and unstructured grid (DMPlex). It looks like memory 
consumption is much larger when DMPlex is used and finally causew 
out_of_memory problem.

Below are some test using both structured grid and unstructured grid. 
The memory consumption by the code is estimated based on all allocated 
arrays and PETSc memory consumption is estimated based on 
PetscMemoryGetMaximumUsage.

I just wonder why the PETSc memory consumption does not decrease when 
number of processors increases. For structured grid (scenario 7-9), the 
memory consumption decreases as number of processors increases. However, 
for unstructured grid case (scenario 14-16), the memory for PETSc part 
remains unchanged. When I run a larger case, the code crashes because 
memory is ran out. The same case works on another cluster with 480GB 
memory per node. Does this make sense?

scenario 	no. points 	cell type 	DMPLex 	nprocs 	no. nodes 	mem per node 
GB 	solver 	Rank 0 memory MB 	Rank 0 petsc memory MB 	Runtime (sec)
1 	2121 	rectangle 	no 	40 	1 	200 	GMRES,Hypre preconditioner 	0.21 	41.6 	
2 	8241 	rectangle 	no 	40 	1 	200 	GMRES,Hypre preconditioner 	0.59 
51.84 	
3 	32481 	rectangle 	no 	40 	1 	200 	GMRES,Hypre preconditioner 	1.95 
59.1 	
4 	128961 	rectangle 	no 	40 	1 	200 	GMRES,Hypre preconditioner 	7.05 
89.71 	
5 	513921 	rectangle 	no 	40 	1 	200 	GMRES,Hypre preconditioner 	26.76 
110.58 	
6 	2051841 	rectangle 	no 	40 	1 	200 	GMRES,Hypre preconditioner 
104.21 	232.05 	
*7* 	*8199681* 	*rectangle* 	*no* 	*40* 	*1* 	*200* 	*GMRES,Hypre 
preconditioner* 	*411.26* 	*703.27* 	*140.29*
*8* 	*8199681* 	*rectangle* 	*no* 	*80* 	*2* 	*200* 	*GMRES,Hypre 
preconditioner* 	*206.6* 	*387.25* 	*62.04*
*9* 	*8199681* 	*rectangle* 	*no* 	*160* 	*4* 	*200* 	*GMRES,Hypre 
preconditioner* 	*104.28* 	*245.3* 	*32.76*

	
	
	
	
	
	
	
	
	
	
10 	2121 	triangle 	yes 	40 	1 	200 	GMRES,Hypre preconditioner 	0.49 
61.78 	
11 	15090 	triangle 	yes 	40 	1 	200 	GMRES,Hypre preconditioner 	2.32 
96.61 	
12 	59847 	triangle 	yes 	40 	1 	200 	GMRES,Hypre preconditioner 	8.28 
176.14 	
13 	238568 	triangle 	yes 	40 	1 	200 	GMRES,Hypre preconditioner 	31.89 
	573.73 	
*14* 	*953433* 	*triangle* 	*yes* 	*40* 	*1* 	*200* 	*GMRES,Hypre 
preconditioner* 	*119.23* 	*2102.54* 	*44.11*
*15* 	*953433* 	*triangle* 	*yes* 	*80* 	*2* 	*200* 	*GMRES,Hypre 
preconditioner* 	*72.99* 	*2123.8* 	*24.36*
*16* 	*953433* 	*triangle* 	*yes* 	*160* 	*4* 	*200* 	*GMRES,Hypre 
preconditioner* 	*48.65* 	*2076.25* 	*14.87*

	
	
	
	
	
	
	
	
	
	
17 	55770 	prism 	yes 	40 	1 	200 	GMRES,Hypre preconditioner 	18.46 
219.39 	
18 	749814 	prism 	yes 	40 	1 	200 	GMRES,Hypre preconditioner 	149.86 
2412.39 	
19 	7000050 	prism 	yes 	40 to 640 	1 to 16 	200 	GMRES,Hypre 
preconditioner 	
	out_of_memory 	

	
	
	
	
	
	
	
	
	
	
*20* 	*7000050* 	*prism* 	*yes* 	*64* 	*2* 	*480* 	*GMRES,Hypre 
preconditioner* 	*890.92* 	*17214.41* 	

The error information of scenario 19 is shown below:

kernel messages produced during job executions:
[Oct 9 10:41] mpiexec.hydra invoked oom-killer: gfp_mask=0x200da, 
order=0, oom_score_adj=0
[  +0.010274] mpiexec.hydra cpuset=/ mems_allowed=0-1
[  +0.006680] CPU: 2 PID: 144904 Comm: mpiexec.hydra Tainted: 
G           OE  ------------   3.10.0-862.14.4.el7.x86_64 #1
[  +0.013365] Hardware name: Lenovo ThinkSystem SD530 
-[7X21CTO1WW]-/-[7X21CTO1WW]-, BIOS -[TEE124N-1.40]- 06/12/2018
[  +0.012866] Call Trace:
[  +0.003945]  [<ffffffffb3313754>] dump_stack+0x19/0x1b
[  +0.006995]  [<ffffffffb330e91f>] dump_header+0x90/0x229
[  +0.007121]  [<ffffffffb2cfa982>] ? ktime_get_ts64+0x52/0xf0
[  +0.007451]  [<ffffffffb2d5141f>] ? delayacct_end+0x8f/0xb0
[  +0.007393]  [<ffffffffb2d9ac94>] oom_kill_process+0x254/0x3d0
[  +0.007592]  [<ffffffffb2d9a73d>] ? oom_unkillable_task+0xcd/0x120
[  +0.007978]  [<ffffffffb2d9a7e6>] ? find_lock_task_mm+0x56/0xc0
[  +0.007729]  [<ffffffffb2d9b4d6>] *out_of_memory+0x4b6/0x4f0*
[  +0.007358]  [<ffffffffb330f423>] __alloc_pages_slowpath+0x5d6/0x724
[  +0.008190]  [<ffffffffb2da18b5>] __alloc_pages_nodemask+0x405/0x420

Thanks,

Danyang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191009/c7f628ad/attachment.html>


More information about the petsc-users mailing list