[petsc-users] petsc4py - Spike in memory usage when loading a matrix in parallel

Fri Oct 8 02:14:26 CDT 2021

I can understand that process 0 needs to have twice its own memory due
to the process Barry explained. However, in my case every process has
twice the "necessary" memory. That doesn't seem to be correct to me.
Especially with Barry's explanation in mind it seems strange that all
processes have the same peak memory usage. If it were only process 0
then it wouldn't matter, because with enough processes the overhead
would be negligible.

Best regards,
Michael

On 07.10.21 18:32, Matthew Knepley wrote:
> On Thu, Oct 7, 2021 at 11:59 AM Michael Werner <michael.werner at dlr.de
> <mailto:michael.werner at dlr.de>> wrote:
>
>     Its twice the memory of the entire matrix (when stored on one
>     process). I also just sent you the valgrind results, both for a
>     serial run and a parallel run. The size on disk of the matrix I
>     used is 20 GB.
>     In the serial run, valgrind shows a peak memory usage of 21GB,
>     while in the parallel run (with 4 processes) each process shows a
>     peak memory usage of 10.8GB
>
>
> Barry is right that at least proc 0 must have twice its own memory,
> since it loads the other pieces. That makes 10GB sounds correct.
>
>   Thanks,
>
>      Matt
>  
>
>     Best regards,
>     Michael
>
>     On 07.10.21 17:55, Barry Smith wrote:
>>
>>
>>>     On Oct 7, 2021, at 11:35 AM, Michael Werner
>>>     <michael.werner at dlr.de <mailto:michael.werner at dlr.de>> wrote:
>>>
>>>     Currently I'm using psutil to query every process for its memory
>>>     usage and sum it up. However, the spike was only visible in top
>>>     (I had a call to psutil right before and after A.load(viewer),
>>>     and both reported only 50 GB of RAM usage). That's why I thought
>>>     it might be directly tied to loading the matrix. However, I also
>>>     had the problem that the computation crashed due to running out
>>>     of memory while loading a matrix that should in theory fit into
>>>     memory. In that case I would expect the OS to free unused meory
>>>     immediatly, right?
>>>
>>>     Concerning Barry's questions: the matrix is a sparse matrix and
>>>     is originally created sequentially as SEQAIJ. However, it is
>>>     then loaded as MPIAIJ, and if I look at the memory usage of the
>>>     various processes, they fill up one after another, just as
>>>     described. Is the origin of the matrix somehow preserved in the
>>>     binary file? I was under the impression that the binary format
>>>     was agnostic to the number of processes?
>>
>>      The file format is independent of the number of processes that
>>     created it.
>>
>>>     I also varied the number of processes between 1 and 60, as soon
>>>     as I use more than one process I can observe the spike (and its
>>>     always twice the memory, no matter how many processes I'm using).
>>
>>       Twice the size of the entire matrix (when stored on one
>>     process) or twice the size of the resulting matrix stored on the
>>     first rank? The latter is exactly as expected, since rank 0 has
>>     to load the part of the matrix destined for the next rank and
>>     hence for a short time contains its own part of the matrix and
>>     the part of one other rank.
>>
>>       Barry
>>
>>>
>>>     I also tried running Valgrind with the --tool=massif option.
>>>     However, I don't know what to look for. I can send you the
>>>     output file separately, if it helps.
>>>
>>>     Best regards,
>>>     Michael
>>>
>>>     On 07.10.21 16:09, Matthew Knepley wrote:
>>>>     On Thu, Oct 7, 2021 at 10:03 AM Barry Smith <bsmith at petsc.dev
>>>>     <mailto:bsmith at petsc.dev>> wrote:
>>>>
>>>>
>>>>            How many ranks are you using? Is it a sparse matrix with
>>>>         MPIAIJ?
>>>>
>>>>            The intention is that for parallel runs the first rank
>>>>         reads in its own part of the matrix, then reads in the part
>>>>         of the next rank and sends it, then reads the part of the
>>>>         third rank and sends it etc. So there should not be too
>>>>         much of a blip in memory usage. You can run valgrind with
>>>>         the option for tracking memory usage to see exactly where
>>>>         in the code the blip occurs; it could be a regression
>>>>         occurred in the code making it require more memory. But
>>>>         internal MPI buffers might explain some blip.
>>>>
>>>>
>>>>     Is it possible that we free the memory, but the OS has just not
>>>>     given back that memory for use yet? How are you measuring
>>>>     memory usage?
>>>>
>>>>       Thanks,
>>>>
>>>>          Matt
>>>>      
>>>>
>>>>           Barry
>>>>
>>>>
>>>>         > On Oct 7, 2021, at 9:50 AM, Michael Werner
>>>>         <michael.werner at dlr.de <mailto:michael.werner at dlr.de>> wrote:
>>>>         >
>>>>         > Hello,
>>>>         >
>>>>         > I noticed that there is a peak in memory consumption when
>>>>         I load an
>>>>         > existing matrix into PETSc. The matrix is previously
>>>>         created by an
>>>>         > external program and saved in the PETSc binary format.
>>>>         > The code I'm using in petsc4py is simple:
>>>>         >
>>>>         > viewer =
>>>>         PETSc.Viewer().createBinary(<path/to/existing/matrix>, "r",
>>>>         > comm=PETSc.COMM_WORLD)
>>>>         > A = PETSc.Mat().create(comm=PETSc.COMM_WORLD)
>>>>         > A.load(viewer)
>>>>         >
>>>>         > When I run this code in serial, the memory consumption of
>>>>         the process is
>>>>         > about 50GB RAM, similar to the file size of the saved
>>>>         matrix. However,
>>>>         > if I run the code in parallel, for a few seconds the
>>>>         memory consumption
>>>>         > of the process doubles to around 100GB RAM, before
>>>>         dropping back down to
>>>>         > around 50GB RAM. So it seems as if, for some reason, the
>>>>         matrix is
>>>>         > copied after it is read into memory. Is there a way to
>>>>         avoid this
>>>>         > behaviour? Currently, it is a clear bottleneck in my code.
>>>>         >
>>>>         > I tried setting the size of the matrix and to explicitly
>>>>         preallocate the
>>>>         > necessary NNZ (with A.setSizes(dim) and
>>>>         A.setPreallocationNNZ(nnz),
>>>>         > respectively) before loading, but that didn't help.
>>>>         >
>>>>         > As mentioned above, I'm using petsc4py together with
>>>>         PETSc-3.16 on a
>>>>         > Linux workstation.
>>>>         >
>>>>         > Best regards,
>>>>         > Michael Werner
>>>>         >
>>>>         > --
>>>>         >
>>>>         > ____________________________________________________
>>>>         >
>>>>         > Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
>>>>         > Institut für Aerodynamik und Strömungstechnik |
>>>>         Bunsenstr. 10 | 37073 Göttingen
>>>>         >
>>>>         > Michael Werner
>>>>         > Telefon 0551 709-2627 | Telefax 0551 709-2811 |
>>>>         Michael.Werner at dlr.de <mailto:Michael.Werner at dlr.de>
>>>>         > DLR.de <http://DLR.de>
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     What most experimenters take for granted before they begin
>>>>     their experiments is infinitely more interesting than any
>>>>     results to which their experiments lead.
>>>>     -- Norbert Wiener
>>>>
>>>>     https://www.cse.buffalo.edu/~knepley/
>>>>     <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211008/647f223c/attachment-0001.html>