[petsc-users] petsc4py - Spike in memory usage when loading a matrix in parallel

Thu Oct 7 10:35:44 CDT 2021

Currently I'm using psutil to query every process for its memory usage
and sum it up. However, the spike was only visible in top (I had a call
to psutil right before and after A.load(viewer), and both reported only
50 GB of RAM usage). That's why I thought it might be directly tied to
loading the matrix. However, I also had the problem that the computation
crashed due to running out of memory while loading a matrix that should
in theory fit into memory. In that case I would expect the OS to free
unused meory immediatly, right?

Concerning Barry's questions: the matrix is a sparse matrix and is
originally created sequentially as SEQAIJ. However, it is then loaded as
MPIAIJ, and if I look at the memory usage of the various processes, they
fill up one after another, just as described. Is the origin of the
matrix somehow preserved in the binary file? I was under the impression
that the binary format was agnostic to the number of processes? I also
varied the number of processes between 1 and 60, as soon as I use more
than one process I can observe the spike (and its always twice the
memory, no matter how many processes I'm using).

I also tried running Valgrind with the --tool=massif option. However, I
don't know what to look for. I can send you the output file separately,
if it helps.

Best regards,
Michael

On 07.10.21 16:09, Matthew Knepley wrote:
> On Thu, Oct 7, 2021 at 10:03 AM Barry Smith <bsmith at petsc.dev
> <mailto:bsmith at petsc.dev>> wrote:
>
>
>        How many ranks are you using? Is it a sparse matrix with MPIAIJ?
>
>        The intention is that for parallel runs the first rank reads in
>     its own part of the matrix, then reads in the part of the next
>     rank and sends it, then reads the part of the third rank and sends
>     it etc. So there should not be too much of a blip in memory usage.
>     You can run valgrind with the option for tracking memory usage to
>     see exactly where in the code the blip occurs; it could be a
>     regression occurred in the code making it require more memory. But
>     internal MPI buffers might explain some blip.
>
>
> Is it possible that we free the memory, but the OS has just not given
> back that memory for use yet? How are you measuring memory usage?
>
>   Thanks,
>
>      Matt
>  
>
>       Barry
>
>
>     > On Oct 7, 2021, at 9:50 AM, Michael Werner
>     <michael.werner at dlr.de <mailto:michael.werner at dlr.de>> wrote:
>     >
>     > Hello,
>     >
>     > I noticed that there is a peak in memory consumption when I load an
>     > existing matrix into PETSc. The matrix is previously created by an
>     > external program and saved in the PETSc binary format.
>     > The code I'm using in petsc4py is simple:
>     >
>     > viewer = PETSc.Viewer().createBinary(<path/to/existing/matrix>, "r",
>     > comm=PETSc.COMM_WORLD)
>     > A = PETSc.Mat().create(comm=PETSc.COMM_WORLD)
>     > A.load(viewer)
>     >
>     > When I run this code in serial, the memory consumption of the
>     process is
>     > about 50GB RAM, similar to the file size of the saved matrix.
>     However,
>     > if I run the code in parallel, for a few seconds the memory
>     consumption
>     > of the process doubles to around 100GB RAM, before dropping back
>     down to
>     > around 50GB RAM. So it seems as if, for some reason, the matrix is
>     > copied after it is read into memory. Is there a way to avoid this
>     > behaviour? Currently, it is a clear bottleneck in my code.
>     >
>     > I tried setting the size of the matrix and to explicitly
>     preallocate the
>     > necessary NNZ (with A.setSizes(dim) and A.setPreallocationNNZ(nnz),
>     > respectively) before loading, but that didn't help.
>     >
>     > As mentioned above, I'm using petsc4py together with PETSc-3.16 on a
>     > Linux workstation.
>     >
>     > Best regards,
>     > Michael Werner
>     >
>     > --
>     >
>     > ____________________________________________________
>     >
>     > Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
>     > Institut für Aerodynamik und Strömungstechnik | Bunsenstr. 10 |
>     37073 Göttingen
>     >
>     > Michael Werner
>     > Telefon 0551 709-2627 | Telefax 0551 709-2811 |
>     Michael.Werner at dlr.de <mailto:Michael.Werner at dlr.de>
>     > DLR.de
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211007/73b5a5ac/attachment.html>