[petsc-users] Memory growth issue

Wed May 29 23:51:54 CDT 2019

I am trying to track down a memory issue with my code; apologies in 
advance for the longish message.

I am solving a FEA problem with a number of load steps involving about 3000
right hand side and tangent assemblies and solves.  The program is 
mainly Fortran, with a C memory allocator.

When I run my code in strictly serial mode (no Petsc or MPI routines) 
the memory stays constant over the whole run.

When I run it in parallel mode with petsc solvers with num_processes=1, 
the memory (max resident set size) also stays constant:

PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 
24,854,528 (constant) [CG/JACOBI]

[PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage 
and PetscMemoryGetCurrentUsage (and summed across processes as needed);
ProgramNativeMalloc reported by program memory allocator.]

When I run it in parallel mode with petsc solvers but num_processes=2, 
the resident memory grows steadily during the run:

PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, 
Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI]

When I run it in parallel mode with petsc solvers but num_processes=4, 
the resident memory grows steadily during the run:

PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 
(constant), Resident Size = (finish) 70,787,072  (start) 45,801,472 
[CG/JACOBI]
PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 
(constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 
[GMRES/BJACOBI]
PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 
(constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 
[SUPERLU]
PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 
(constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS]

The memory growth feels alarming but maybe I do not understand the 
values in ru_maxrss from getrusage().

My box (MacBook Pro) has a broken Valgrind so I need to get to a system 
with a functional one; notwithstanding, the code has always been 
Valgrind clean.
There are no Fortran Pointers or Fortran Allocatable arrays in the part 
of the code being used.  The program's C memory allocator keeps track of
itself so I do not see that the problem is there.  The Petsc malloc is 
also steady.

Other random hints:

1) If I comment out the call to KSPSolve and to my MPI data-exchange 
routine (for passing solution values between processes after each solve,
use  MPI_Isend, MPI_Recv, MPI_BARRIER)  the memory growth essentially 
goes away.

2) If I comment out the call to my MPI data-exchange routine but leave 
the call to KSPSolve the problem remains but is substantially reduced
for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, 
and MUMPS runs.

3) If I comment out the call to KSPSolve but leave the call to my MPI 
data-exchange routine the problem remains.

Any suggestions/hints of where to look will be great.

-sanjay