[petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

Ale Foggia amfoggia at gmail.com
Thu Oct 4 12:54:05 CDT 2018


Thank you both for your answers :)

Matt:
-Yes, sorry I forgot to tell you that, but I've also called
PetscMemorySetGetMaximumUsage() right after initializing SLEPc. Also I've
seen a strange behaviour: if I ran the same code in my computer and in the
cluster *without* the command line option -malloc_dump, in the cluster the
output of PetscMallocGetCurrentUsage and PetscMallocGetMaximumUsage is
always zero, but that doesn't happen in my computer.

- This is the output of the code for the solving part (after EPSCreate and
after EPSSolve), and I've compared it with the output of *top* during those
moments of peak memory consumption. *top* provides in one of the columns
the resident set size (RES) and the numbers are around 1 GB per process,
while, considering the numbers reported by the PETSc functions, the one
that is more similar to that is given by MemoryGetCurrentUsage and is only
800 MB in the solving stage. Maybe, we can consider that those numbers are
the same plus/minus something? Is it safe to say that MemoryGetCurrentUsage
is measuring the "ru_maxss" member of "rusage" (or something similar)? If
that's the case, what do the other functions report?

==================== SOLVER INIT ====================
MallocGetCurrent (init): 396096192.0 B
MallocGetMaximum (init): 415178624.0 B
MemoryGetCurrent (init): 624050176.0 B
MemoryGetMaximum (init): 623775744.0 B
==================== SOLVER ====================
MallocGetCurrent (solver): 560320256.0 B
MallocGetMaximum (solver): 560333440.0 B
MemoryGetCurrent (solver): 820961280.0 B
MemoryGetMaximum (solver): 623775744.0 B

Jose:
- By each step I mean each of the step of the the program in order to
diagonalize the matrix. For me, those are: creation of basis, preallocation
of matrix, setting values of matrix, initializing solver,
solving/diagonalizing and cleaning. I'm only diagonalizing once.

- Regarding the information provided by -log_view, it's confusing for me:
for example, it reports the creation of Vecs scattered across the various
stages that I've set up (with PetscLogStageRegister and
PetscLogStagePush/Pop), but almost all the deletions are presented in the
"Main Stage". What does that "Main Stage" consider? Why are more deletions
in there that creations? It's nor completely for me clear how things are
presented there.

- Thanks for the suggestion about the solver. Does "faster convergence" for
Krylov-Schur mean less memory and less computation, or just less
computation?

Ale


El jue., 4 oct. 2018 a las 13:12, Jose E. Roman (<jroman at dsic.upv.es>)
escribió:

> Regarding the SLEPc part:
> - What do you mean by "each step"? Are you calling EPSSolve() several
> times?
> - Yes, the BV object is generally what takes most of the memory. It is
> allocated at the beginning of EPSSolve(). Depending on the solver/options,
> other memory may be allocated as well.
> - You can also see the memory reported at the end of -log_view
> - I would suggest using the default solver Krylov-Schur - it will do
> Lanczos with implicit restart, which will give faster convergence than the
> EPSLANCZOS solver.
>
> Jose
>
>
> > El 4 oct 2018, a las 12:49, Matthew Knepley <knepley at gmail.com>
> escribió:
> >
> > On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia <amfoggia at gmail.com> wrote:
> > Hello all,
> >
> > I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of
> a matrix with the following characteristics:
> >
> > * type: real, Hermitian, sparse
> > * linear size: 2333606220
> > * distributed in 2048 processes (64 nodes, 32 procs per node)
> >
> > My code first preallocates the necessary memory with
> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it
> calls the following functions to create the solver and diagonalize the
> matrix:
> >
> > EPSCreate(PETSC_COMM_WORLD, &solver);
> > EPSSetOperators(solver,matrix,NULL);
> > EPSSetProblemType(solver, EPS_HEP);
> > EPSSetType(solver, EPSLANCZOS);
> > EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
> > EPSSetFromOptions(solver);
> > EPSSolve(solver);
> >
> > I want to make an estimation for larger size problems of the memory used
> by the program (at every step) because I would like to keep it under 16 GB
> per node. I've used the "memory usage" functions provided by PETSc, but
> something happens during the solver stage that I can't explain. This brings
> up two questions.
> >
> > 1) In each step I put a call to four memory functions and between them I
> print the value of mem:
> >
> > Did you call PetscMemorySetGetMaximumUsage() first?
> >
> > We are computing https://en.wikipedia.org/wiki/Resident_set_size
> however we can. Usually with getrusage().
> > From this (https://www.binarytides.com/linux-command-check-memory-usage/),
> it looks like top also reports
> > paged out memory.
> >
> >    Matt
> >
> > mem = 0;
> > PetscMallocGetCurrentUsage(&mem);
> > PetscMallocGetMaximumUsage(&mem);
> > PetscMemoryGetCurrentUsage(&mem);
> > PetscMemoryGetMaximumUsage(&mem);
> >
> > I've read some other question in the mailing list regarding the same
> issue but I can't fully understand this. What is the difference between all
> of them? What information are they actually giving me? (I know this is only
> a "per process" output). I copy the output of two steps of the program as
> an example:
> >
> > ==================== step N ====================
> > MallocGetCurrent: 314513664.0 B
> > MallocGetMaximum: 332723328.0 B
> > MemoryGetCurrent: 539996160.0 B
> > MemoryGetMaximum: 0.0 B
> > ==================== step N+1 ====================
> > MallocGetCurrent: 395902912.0 B
> > MallocGetMaximum: 415178624.0 B
> > MemoryGetCurrent: 623783936.0 B
> > MemoryGetMaximum: 623775744.0 B
> >
> > 2) I was using this information to make the calculation of the memory
> required per node to run my problem. Also, I'm able to login to the
> computing node while running and I can check the memory consumption (with
> *top*). The memory used that I see with top is more or less the same as the
> one reported by PETSc functions at the beginning. But during the
> inialization of the solver and during the solving, *top* reports a
> consumption two times bigger than the one the functions report. Is it
> possible to know from where this extra memory consumption comes from? What
> things does SLEPc allocate that need that much memory? I've been trying to
> do the math but I think there are things I'm missing. I thought that part
> of it comes from the "BV" that the option -eps_view reports:
> >
> > BV Object: 2048 MPI processes
> >   type: svec
> >   17 columns of global length 2333606220
> >   vector orthogonalization method: modified Gram-Schmidt
> >   orthogonalization refinement: if needed (eta: 0.7071)
> >   block orthogonalization method: GS
> >   doing matmult as a single matrix-matrix product
> >
> > But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less
> of the "extra" memory.
> >
> > Ale
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181004/0801effc/attachment.html>


More information about the petsc-users mailing list