[petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

Thu Oct 4 06:12:18 CDT 2018

Regarding the SLEPc part:
- What do you mean by "each step"? Are you calling EPSSolve() several times?
- Yes, the BV object is generally what takes most of the memory. It is allocated at the beginning of EPSSolve(). Depending on the solver/options, other memory may be allocated as well.
- You can also see the memory reported at the end of -log_view
- I would suggest using the default solver Krylov-Schur - it will do Lanczos with implicit restart, which will give faster convergence than the EPSLANCZOS solver.

Jose


> El 4 oct 2018, a las 12:49, Matthew Knepley <knepley at gmail.com> escribió:
> 
> On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia <amfoggia at gmail.com> wrote:
> Hello all,
> 
> I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of a matrix with the following characteristics:
> 
> * type: real, Hermitian, sparse
> * linear size: 2333606220 
> * distributed in 2048 processes (64 nodes, 32 procs per node)
> 
> My code first preallocates the necessary memory with *MatMPIAIJSetPreallocation*, then fills it with the values and finally it calls the following functions to create the solver and diagonalize the matrix:
> 
> EPSCreate(PETSC_COMM_WORLD, &solver);
> EPSSetOperators(solver,matrix,NULL);
> EPSSetProblemType(solver, EPS_HEP);
> EPSSetType(solver, EPSLANCZOS);
> EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
> EPSSetFromOptions(solver);
> EPSSolve(solver);
> 
> I want to make an estimation for larger size problems of the memory used by the program (at every step) because I would like to keep it under 16 GB per node. I've used the "memory usage" functions provided by PETSc, but something happens during the solver stage that I can't explain. This brings up two questions.
> 
> 1) In each step I put a call to four memory functions and between them I print the value of mem:
> 
> Did you call PetscMemorySetGetMaximumUsage() first?
> 
> We are computing https://en.wikipedia.org/wiki/Resident_set_size however we can. Usually with getrusage().
> From this (https://www.binarytides.com/linux-command-check-memory-usage/), it looks like top also reports
> paged out memory.
> 
>    Matt
>  
> mem = 0;
> PetscMallocGetCurrentUsage(&mem);
> PetscMallocGetMaximumUsage(&mem);
> PetscMemoryGetCurrentUsage(&mem);
> PetscMemoryGetMaximumUsage(&mem);
> 
> I've read some other question in the mailing list regarding the same issue but I can't fully understand this. What is the difference between all of them? What information are they actually giving me? (I know this is only a "per process" output). I copy the output of two steps of the program as an example:
> 
> ==================== step N ====================
> MallocGetCurrent: 314513664.0 B
> MallocGetMaximum: 332723328.0 B
> MemoryGetCurrent: 539996160.0 B
> MemoryGetMaximum: 0.0 B
> ==================== step N+1 ====================
> MallocGetCurrent: 395902912.0 B
> MallocGetMaximum: 415178624.0 B
> MemoryGetCurrent: 623783936.0 B
> MemoryGetMaximum: 623775744.0 B
> 
> 2) I was using this information to make the calculation of the memory required per node to run my problem. Also, I'm able to login to the computing node while running and I can check the memory consumption (with *top*). The memory used that I see with top is more or less the same as the one reported by PETSc functions at the beginning. But during the inialization of the solver and during the solving, *top* reports a consumption two times bigger than the one the functions report. Is it possible to know from where this extra memory consumption comes from? What things does SLEPc allocate that need that much memory? I've been trying to do the math but I think there are things I'm missing. I thought that part of it comes from the "BV" that the option -eps_view reports:
> 
> BV Object: 2048 MPI processes
>   type: svec
>   17 columns of global length 2333606220
>   vector orthogonalization method: modified Gram-Schmidt
>   orthogonalization refinement: if needed (eta: 0.7071)
>   block orthogonalization method: GS
>   doing matmult as a single matrix-matrix product
> 
> But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less of the "extra" memory.
> 
> Ale
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/