<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
I can understand that process 0 needs to have twice its own memory
due to the process Barry explained. However, in my case every
process has twice the "necessary" memory. That doesn't seem to be
correct to me. Especially with Barry's explanation in mind it seems
strange that all processes have the same peak memory usage. If it
were only process 0 then it wouldn't matter, because with enough
processes the overhead would be negligible. <br>
<br>
Best regards,<br>
Michael<br>
<br>
<div class="moz-cite-prefix">On 07.10.21 18:32, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAMYG4GkU=9Y=dWJCSJBUL7iZ85kzbXF2WCzq_f3HJe01ZYNpjg@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Thu, Oct 7, 2021 at 11:59 AM Michael Werner
<<a href="mailto:michael.werner@dlr.de"
moz-do-not-send="true">michael.werner@dlr.de</a>> wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> Its twice the memory of the entire matrix (when stored
on one process). I also just sent you the valgrind
results, both for a serial run and a parallel run. The
size on disk of the matrix I used is 20 GB. <br>
In the serial run, valgrind shows a peak memory usage of
21GB, while in the parallel run (with 4 processes) each
process shows a peak memory usage of 10.8GB<br>
</div>
</blockquote>
<div><br>
</div>
<div>Barry is right that at least proc 0 must have twice its
own memory, since it loads the other pieces. That makes 10GB
sounds correct.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div> Best regards,<br>
Michael<br>
<br>
<div>On 07.10.21 17:55, Barry Smith wrote:<br>
</div>
<blockquote type="cite"> <br>
<div><br>
<blockquote type="cite">
<div>On Oct 7, 2021, at 11:35 AM, Michael Werner
<<a href="mailto:michael.werner@dlr.de"
target="_blank" moz-do-not-send="true">michael.werner@dlr.de</a>>
wrote:</div>
<br>
<div>
<div> Currently I'm using psutil to query every
process for its memory usage and sum it up.
However, the spike was only visible in top (I
had a call to psutil right before and after
A.load(viewer), and both reported only 50 GB of
RAM usage). That's why I thought it might be
directly tied to loading the matrix. However, I
also had the problem that the computation
crashed due to running out of memory while
loading a matrix that should in theory fit into
memory. In that case I would expect the OS to
free unused meory immediatly, right?<br>
<br>
Concerning Barry's questions: the matrix is a
sparse matrix and is originally created
sequentially as SEQAIJ. However, it is then
loaded as MPIAIJ, and if I look at the memory
usage of the various processes, they fill up one
after another, just as described. Is the origin
of the matrix somehow preserved in the binary
file? I was under the impression that the binary
format was agnostic to the number of processes?
</div>
</div>
</blockquote>
<div><br>
</div>
The file format is independent of the number of
processes that created it.</div>
<div><br>
<blockquote type="cite">
<div>
<div>I also varied the number of processes between
1 and 60, as soon as I use more than one process
I can observe the spike (and its always twice
the memory, no matter how many processes I'm
using).<br>
</div>
</div>
</blockquote>
<div><br>
</div>
Twice the size of the entire matrix (when stored on
one process) or twice the size of the resulting matrix
stored on the first rank? The latter is exactly as
expected, since rank 0 has to load the part of the
matrix destined for the next rank and hence for a
short time contains its own part of the matrix and the
part of one other rank.</div>
<div><br>
</div>
<div> Barry</div>
<div><br>
<blockquote type="cite">
<div>
<div> <br>
I also tried running Valgrind with the
--tool=massif option. However, I don't know what
to look for. I can send you the output file
separately, if it helps.<br>
<br>
Best regards,<br>
Michael <br>
<br>
<div>On 07.10.21 16:09, Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">On Thu, Oct 7, 2021 at 10:03
AM Barry Smith <<a
href="mailto:bsmith@petsc.dev"
target="_blank" moz-do-not-send="true">bsmith@petsc.dev</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><br>
How many ranks are you using? Is it a
sparse matrix with MPIAIJ? <br>
<br>
The intention is that for parallel
runs the first rank reads in its own
part of the matrix, then reads in the
part of the next rank and sends it, then
reads the part of the third rank and
sends it etc. So there should not be too
much of a blip in memory usage. You can
run valgrind with the option for
tracking memory usage to see exactly
where in the code the blip occurs; it
could be a regression occurred in the
code making it require more memory. But
internal MPI buffers might explain some
blip.<br>
</blockquote>
<div><br>
</div>
<div>Is it possible that we free the
memory, but the OS has just not given
back that memory for use yet? How are
you measuring memory usage?</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Barry<br>
<br>
<br>
> On Oct 7, 2021, at 9:50 AM, Michael
Werner <<a
href="mailto:michael.werner@dlr.de"
target="_blank" moz-do-not-send="true">michael.werner@dlr.de</a>>
wrote:<br>
> <br>
> Hello,<br>
> <br>
> I noticed that there is a peak in
memory consumption when I load an<br>
> existing matrix into PETSc. The
matrix is previously created by an<br>
> external program and saved in the
PETSc binary format.<br>
> The code I'm using in petsc4py is
simple:<br>
> <br>
> viewer =
PETSc.Viewer().createBinary(<path/to/existing/matrix>,
"r",<br>
> comm=PETSc.COMM_WORLD)<br>
> A =
PETSc.Mat().create(comm=PETSc.COMM_WORLD)<br>
> A.load(viewer)<br>
> <br>
> When I run this code in serial, the
memory consumption of the process is<br>
> about 50GB RAM, similar to the file
size of the saved matrix. However,<br>
> if I run the code in parallel, for
a few seconds the memory consumption<br>
> of the process doubles to around
100GB RAM, before dropping back down to<br>
> around 50GB RAM. So it seems as if,
for some reason, the matrix is<br>
> copied after it is read into
memory. Is there a way to avoid this<br>
> behaviour? Currently, it is a clear
bottleneck in my code.<br>
> <br>
> I tried setting the size of the
matrix and to explicitly preallocate the<br>
> necessary NNZ (with A.setSizes(dim)
and A.setPreallocationNNZ(nnz),<br>
> respectively) before loading, but
that didn't help.<br>
> <br>
> As mentioned above, I'm using
petsc4py together with PETSc-3.16 on a<br>
> Linux workstation.<br>
> <br>
> Best regards,<br>
> Michael Werner<br>
> <br>
> -- <br>
> <br>
>
____________________________________________________<br>
> <br>
> Deutsches Zentrum für Luft- und
Raumfahrt e.V. (DLR)<br>
> Institut für Aerodynamik und
Strömungstechnik | Bunsenstr. 10 | 37073
Göttingen<br>
> <br>
> Michael Werner <br>
> Telefon 0551 709-2627 | Telefax
0551 709-2811 | <a
href="mailto:Michael.Werner@dlr.de"
target="_blank" moz-do-not-send="true">Michael.Werner@dlr.de</a><br>
> <a href="http://DLR.de"
target="_blank" moz-do-not-send="true">DLR.de</a><br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
> <br>
<br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters
take for granted before they
begin their experiments is
infinitely more interesting
than any results to which
their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>