[petsc-users] petsc4py - Spike in memory usage when loading a matrix in parallel

Barry Smith bsmith at petsc.dev
Thu Oct 7 10:55:00 CDT 2021



> On Oct 7, 2021, at 11:35 AM, Michael Werner <michael.werner at dlr.de> wrote:
> 
> Currently I'm using psutil to query every process for its memory usage and sum it up. However, the spike was only visible in top (I had a call to psutil right before and after A.load(viewer), and both reported only 50 GB of RAM usage). That's why I thought it might be directly tied to loading the matrix. However, I also had the problem that the computation crashed due to running out of memory while loading a matrix that should in theory fit into memory. In that case I would expect the OS to free unused meory immediatly, right?
> 
> Concerning Barry's questions: the matrix is a sparse matrix and is originally created sequentially as SEQAIJ. However, it is then loaded as MPIAIJ, and if I look at the memory usage of the various processes, they fill up one after another, just as described. Is the origin of the matrix somehow preserved in the binary file? I was under the impression that the binary format was agnostic to the number of processes?

 The file format is independent of the number of processes that created it.

> I also varied the number of processes between 1 and 60, as soon as I use more than one process I can observe the spike (and its always twice the memory, no matter how many processes I'm using).

  Twice the size of the entire matrix (when stored on one process) or twice the size of the resulting matrix stored on the first rank? The latter is exactly as expected, since rank 0 has to load the part of the matrix destined for the next rank and hence for a short time contains its own part of the matrix and the part of one other rank.

  Barry

> 
> I also tried running Valgrind with the --tool=massif option. However, I don't know what to look for. I can send you the output file separately, if it helps.
> 
> Best regards,
> Michael 
> 
> On 07.10.21 16:09, Matthew Knepley wrote:
>> On Thu, Oct 7, 2021 at 10:03 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>>    How many ranks are you using? Is it a sparse matrix with MPIAIJ? 
>> 
>>    The intention is that for parallel runs the first rank reads in its own part of the matrix, then reads in the part of the next rank and sends it, then reads the part of the third rank and sends it etc. So there should not be too much of a blip in memory usage. You can run valgrind with the option for tracking memory usage to see exactly where in the code the blip occurs; it could be a regression occurred in the code making it require more memory. But internal MPI buffers might explain some blip.
>> 
>> Is it possible that we free the memory, but the OS has just not given back that memory for use yet? How are you measuring memory usage?
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>>   Barry
>> 
>> 
>> > On Oct 7, 2021, at 9:50 AM, Michael Werner <michael.werner at dlr.de <mailto:michael.werner at dlr.de>> wrote:
>> > 
>> > Hello,
>> > 
>> > I noticed that there is a peak in memory consumption when I load an
>> > existing matrix into PETSc. The matrix is previously created by an
>> > external program and saved in the PETSc binary format.
>> > The code I'm using in petsc4py is simple:
>> > 
>> > viewer = PETSc.Viewer().createBinary(<path/to/existing/matrix>, "r",
>> > comm=PETSc.COMM_WORLD)
>> > A = PETSc.Mat().create(comm=PETSc.COMM_WORLD)
>> > A.load(viewer)
>> > 
>> > When I run this code in serial, the memory consumption of the process is
>> > about 50GB RAM, similar to the file size of the saved matrix. However,
>> > if I run the code in parallel, for a few seconds the memory consumption
>> > of the process doubles to around 100GB RAM, before dropping back down to
>> > around 50GB RAM. So it seems as if, for some reason, the matrix is
>> > copied after it is read into memory. Is there a way to avoid this
>> > behaviour? Currently, it is a clear bottleneck in my code.
>> > 
>> > I tried setting the size of the matrix and to explicitly preallocate the
>> > necessary NNZ (with A.setSizes(dim) and A.setPreallocationNNZ(nnz),
>> > respectively) before loading, but that didn't help.
>> > 
>> > As mentioned above, I'm using petsc4py together with PETSc-3.16 on a
>> > Linux workstation.
>> > 
>> > Best regards,
>> > Michael Werner
>> > 
>> > -- 
>> > 
>> > ____________________________________________________
>> > 
>> > Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
>> > Institut für Aerodynamik und Strömungstechnik | Bunsenstr. 10 | 37073 Göttingen
>> > 
>> > Michael Werner 
>> > Telefon 0551 709-2627 | Telefax 0551 709-2811 | Michael.Werner at dlr.de <mailto:Michael.Werner at dlr.de>
>> > DLR.de
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211007/874e79bc/attachment-0001.html>


More information about the petsc-users mailing list