[petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

Thu Oct 4 03:42:46 CDT 2018

Hello all,

I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of a
matrix with the following characteristics:

* type: real, Hermitian, sparse
* linear size: 2333606220
* distributed in 2048 processes (64 nodes, 32 procs per node)

My code first preallocates the necessary memory with
*MatMPIAIJSetPreallocation*, then fills it with the values and finally it
calls the following functions to create the solver and diagonalize the
matrix:

EPSCreate(PETSC_COMM_WORLD, &solver);
EPSSetOperators(solver,matrix,NULL);
EPSSetProblemType(solver, EPS_HEP);
EPSSetType(solver, EPSLANCZOS);
EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
EPSSetFromOptions(solver);
EPSSolve(solver);

I want to make an estimation for larger size problems of the memory used by
the program (at every step) because I would like to keep it under 16 GB per
node. I've used the "memory usage" functions provided by PETSc, but
something happens during the solver stage that I can't explain. This brings
up two questions.

1) In each step I put a call to four memory functions and between them I
print the value of mem:

mem = 0;
PetscMallocGetCurrentUsage(&mem);
PetscMallocGetMaximumUsage(&mem);
PetscMemoryGetCurrentUsage(&mem);
PetscMemoryGetMaximumUsage(&mem);

I've read some other question in the mailing list regarding the same issue
but I can't fully understand this. What is the difference between all of
them? What information are they actually giving me? (I know this is only a
"per process" output). I copy the output of two steps of the program as an
example:

==================== step N ====================
MallocGetCurrent: 314513664.0 B
MallocGetMaximum: 332723328.0 B
MemoryGetCurrent: 539996160.0 B
MemoryGetMaximum: 0.0 B
==================== step N+1 ====================
MallocGetCurrent: 395902912.0 B
MallocGetMaximum: 415178624.0 B
MemoryGetCurrent: 623783936.0 B
MemoryGetMaximum: 623775744.0 B

2) I was using this information to make the calculation of the memory
required per node to run my problem. Also, I'm able to login to the
computing node while running and I can check the memory consumption (with
*top*). The memory used that I see with top is more or less the same as the
one reported by PETSc functions at the beginning. But during the
inialization of the solver and during the solving, *top* reports a
consumption two times bigger than the one the functions report. Is it
possible to know from where this extra memory consumption comes from? What
things does SLEPc allocate that need that much memory? I've been trying to
do the math but I think there are things I'm missing. I thought that part
of it comes from the "BV" that the option -eps_view reports:

BV Object: 2048 MPI processes
  type: svec
  17 columns of global length 2333606220
  vector orthogonalization method: modified Gram-Schmidt
  orthogonalization refinement: if needed (eta: 0.7071)
  block orthogonalization method: GS
  doing matmult as a single matrix-matrix product

But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less of
the "extra" memory.

Ale
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181004/5e6881aa/attachment.html>