[petsc-users] petsc4py - Spike in memory usage when loading a matrix in parallel
Michael Werner
michael.werner at dlr.de
Thu Oct 7 10:59:57 CDT 2021
Its twice the memory of the entire matrix (when stored on one process).
I also just sent you the valgrind results, both for a serial run and a
parallel run. The size on disk of the matrix I used is 20 GB.
In the serial run, valgrind shows a peak memory usage of 21GB, while in
the parallel run (with 4 processes) each process shows a peak memory
usage of 10.8GB
Best regards,
Michael
On 07.10.21 17:55, Barry Smith wrote:
>
>
>> On Oct 7, 2021, at 11:35 AM, Michael Werner <michael.werner at dlr.de
>> <mailto:michael.werner at dlr.de>> wrote:
>>
>> Currently I'm using psutil to query every process for its memory
>> usage and sum it up. However, the spike was only visible in top (I
>> had a call to psutil right before and after A.load(viewer), and both
>> reported only 50 GB of RAM usage). That's why I thought it might be
>> directly tied to loading the matrix. However, I also had the problem
>> that the computation crashed due to running out of memory while
>> loading a matrix that should in theory fit into memory. In that case
>> I would expect the OS to free unused meory immediatly, right?
>>
>> Concerning Barry's questions: the matrix is a sparse matrix and is
>> originally created sequentially as SEQAIJ. However, it is then loaded
>> as MPIAIJ, and if I look at the memory usage of the various
>> processes, they fill up one after another, just as described. Is the
>> origin of the matrix somehow preserved in the binary file? I was
>> under the impression that the binary format was agnostic to the
>> number of processes?
>
> The file format is independent of the number of processes that
> created it.
>
>> I also varied the number of processes between 1 and 60, as soon as I
>> use more than one process I can observe the spike (and its always
>> twice the memory, no matter how many processes I'm using).
>
> Twice the size of the entire matrix (when stored on one process) or
> twice the size of the resulting matrix stored on the first rank? The
> latter is exactly as expected, since rank 0 has to load the part of
> the matrix destined for the next rank and hence for a short time
> contains its own part of the matrix and the part of one other rank.
>
> Barry
>
>>
>> I also tried running Valgrind with the --tool=massif option. However,
>> I don't know what to look for. I can send you the output file
>> separately, if it helps.
>>
>> Best regards,
>> Michael
>>
>> On 07.10.21 16:09, Matthew Knepley wrote:
>>> On Thu, Oct 7, 2021 at 10:03 AM Barry Smith <bsmith at petsc.dev
>>> <mailto:bsmith at petsc.dev>> wrote:
>>>
>>>
>>> How many ranks are you using? Is it a sparse matrix with MPIAIJ?
>>>
>>> The intention is that for parallel runs the first rank reads
>>> in its own part of the matrix, then reads in the part of the
>>> next rank and sends it, then reads the part of the third rank
>>> and sends it etc. So there should not be too much of a blip in
>>> memory usage. You can run valgrind with the option for tracking
>>> memory usage to see exactly where in the code the blip occurs;
>>> it could be a regression occurred in the code making it require
>>> more memory. But internal MPI buffers might explain some blip.
>>>
>>>
>>> Is it possible that we free the memory, but the OS has just not
>>> given back that memory for use yet? How are you measuring memory usage?
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>> Barry
>>>
>>>
>>> > On Oct 7, 2021, at 9:50 AM, Michael Werner
>>> <michael.werner at dlr.de <mailto:michael.werner at dlr.de>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I noticed that there is a peak in memory consumption when I
>>> load an
>>> > existing matrix into PETSc. The matrix is previously created by an
>>> > external program and saved in the PETSc binary format.
>>> > The code I'm using in petsc4py is simple:
>>> >
>>> > viewer =
>>> PETSc.Viewer().createBinary(<path/to/existing/matrix>, "r",
>>> > comm=PETSc.COMM_WORLD)
>>> > A = PETSc.Mat().create(comm=PETSc.COMM_WORLD)
>>> > A.load(viewer)
>>> >
>>> > When I run this code in serial, the memory consumption of the
>>> process is
>>> > about 50GB RAM, similar to the file size of the saved matrix.
>>> However,
>>> > if I run the code in parallel, for a few seconds the memory
>>> consumption
>>> > of the process doubles to around 100GB RAM, before dropping
>>> back down to
>>> > around 50GB RAM. So it seems as if, for some reason, the matrix is
>>> > copied after it is read into memory. Is there a way to avoid this
>>> > behaviour? Currently, it is a clear bottleneck in my code.
>>> >
>>> > I tried setting the size of the matrix and to explicitly
>>> preallocate the
>>> > necessary NNZ (with A.setSizes(dim) and
>>> A.setPreallocationNNZ(nnz),
>>> > respectively) before loading, but that didn't help.
>>> >
>>> > As mentioned above, I'm using petsc4py together with
>>> PETSc-3.16 on a
>>> > Linux workstation.
>>> >
>>> > Best regards,
>>> > Michael Werner
>>> >
>>> > --
>>> >
>>> > ____________________________________________________
>>> >
>>> > Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
>>> > Institut für Aerodynamik und Strömungstechnik | Bunsenstr. 10
>>> | 37073 Göttingen
>>> >
>>> > Michael Werner
>>> > Telefon 0551 709-2627 | Telefax 0551 709-2811 |
>>> Michael.Werner at dlr.de <mailto:Michael.Werner at dlr.de>
>>> > DLR.de <http://DLR.de>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211007/2b91a641/attachment.html>
More information about the petsc-users
mailing list