<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    I can understand that process 0 needs to have twice its own memory
    due to the process Barry explained. However, in my case every
    process has twice the "necessary" memory. That doesn't seem to be
    correct to me. Especially with Barry's explanation in mind it seems
    strange that all processes have the same peak memory usage. If it
    were only process 0 then it wouldn't matter, because with enough
    processes the overhead would be negligible. <br>
    <br>
    Best regards,<br>
    Michael<br>
    <br>
    <div class="moz-cite-prefix">On 07.10.21 18:32, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAMYG4GkU=9Y=dWJCSJBUL7iZ85kzbXF2WCzq_f3HJe01ZYNpjg@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">On Thu, Oct 7, 2021 at 11:59 AM Michael Werner
          <<a href="mailto:michael.werner@dlr.de"
            moz-do-not-send="true">michael.werner@dlr.de</a>> wrote:<br>
        </div>
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div> Its twice the memory of the entire matrix (when stored
              on one process). I also just sent you the valgrind
              results, both for a serial run and a parallel run. The
              size on disk of the matrix I used is 20 GB. <br>
              In the serial run, valgrind shows a peak memory usage of
              21GB, while in the parallel run (with 4 processes) each
              process shows a peak memory usage of 10.8GB<br>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Barry is right that at least proc 0 must have twice its
            own memory, since it loads the other pieces. That makes 10GB
            sounds correct.</div>
          <div><br>
          </div>
          <div>  Thanks,</div>
          <div><br>
          </div>
          <div>     Matt</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div> Best regards,<br>
              Michael<br>
              <br>
              <div>On 07.10.21 17:55, Barry Smith wrote:<br>
              </div>
              <blockquote type="cite"> <br>
                <div><br>
                  <blockquote type="cite">
                    <div>On Oct 7, 2021, at 11:35 AM, Michael Werner
                      <<a href="mailto:michael.werner@dlr.de"
                        target="_blank" moz-do-not-send="true">michael.werner@dlr.de</a>>
                      wrote:</div>
                    <br>
                    <div>
                      <div> Currently I'm using psutil to query every
                        process for its memory usage and sum it up.
                        However, the spike was only visible in top (I
                        had a call to psutil right before and after
                        A.load(viewer), and both reported only 50 GB of
                        RAM usage). That's why I thought it might be
                        directly tied to loading the matrix. However, I
                        also had the problem that the computation
                        crashed due to running out of memory while
                        loading a matrix that should in theory fit into
                        memory. In that case I would expect the OS to
                        free unused meory immediatly, right?<br>
                        <br>
                        Concerning Barry's questions: the matrix is a
                        sparse matrix and is originally created
                        sequentially as SEQAIJ. However, it is then
                        loaded as MPIAIJ, and if I look at the memory
                        usage of the various processes, they fill up one
                        after another, just as described. Is the origin
                        of the matrix somehow preserved in the binary
                        file? I was under the impression that the binary
                        format was agnostic to the number of processes?
                      </div>
                    </div>
                  </blockquote>
                  <div><br>
                  </div>
                   The file format is independent of the number of
                  processes that created it.</div>
                <div><br>
                  <blockquote type="cite">
                    <div>
                      <div>I also varied the number of processes between
                        1 and 60, as soon as I use more than one process
                        I can observe the spike (and its always twice
                        the memory, no matter how many processes I'm
                        using).<br>
                      </div>
                    </div>
                  </blockquote>
                  <div><br>
                  </div>
                    Twice the size of the entire matrix (when stored on
                  one process) or twice the size of the resulting matrix
                  stored on the first rank? The latter is exactly as
                  expected, since rank 0 has to load the part of the
                  matrix destined for the next rank and hence for a
                  short time contains its own part of the matrix and the
                  part of one other rank.</div>
                <div><br>
                </div>
                <div>  Barry</div>
                <div><br>
                  <blockquote type="cite">
                    <div>
                      <div> <br>
                        I also tried running Valgrind with the
                        --tool=massif option. However, I don't know what
                        to look for. I can send you the output file
                        separately, if it helps.<br>
                        <br>
                        Best regards,<br>
                        Michael <br>
                        <br>
                        <div>On 07.10.21 16:09, Matthew Knepley wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div dir="ltr">On Thu, Oct 7, 2021 at 10:03
                              AM Barry Smith <<a
                                href="mailto:bsmith@petsc.dev"
                                target="_blank" moz-do-not-send="true">bsmith@petsc.dev</a>>
                              wrote:<br>
                            </div>
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex"><br>
                                   How many ranks are you using? Is it a
                                sparse matrix with MPIAIJ? <br>
                                <br>
                                   The intention is that for parallel
                                runs the first rank reads in its own
                                part of the matrix, then reads in the
                                part of the next rank and sends it, then
                                reads the part of the third rank and
                                sends it etc. So there should not be too
                                much of a blip in memory usage. You can
                                run valgrind with the option for
                                tracking memory usage to see exactly
                                where in the code the blip occurs; it
                                could be a regression occurred in the
                                code making it require more memory. But
                                internal MPI buffers might explain some
                                blip.<br>
                              </blockquote>
                              <div><br>
                              </div>
                              <div>Is it possible that we free the
                                memory, but the OS has just not given
                                back that memory for use yet? How are
                                you measuring memory usage?</div>
                              <div><br>
                              </div>
                              <div>  Thanks,</div>
                              <div><br>
                              </div>
                              <div>     Matt</div>
                              <div> </div>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex">  
                                Barry<br>
                                <br>
                                <br>
                                > On Oct 7, 2021, at 9:50 AM, Michael
                                Werner <<a
                                  href="mailto:michael.werner@dlr.de"
                                  target="_blank" moz-do-not-send="true">michael.werner@dlr.de</a>>
                                wrote:<br>
                                > <br>
                                > Hello,<br>
                                > <br>
                                > I noticed that there is a peak in
                                memory consumption when I load an<br>
                                > existing matrix into PETSc. The
                                matrix is previously created by an<br>
                                > external program and saved in the
                                PETSc binary format.<br>
                                > The code I'm using in petsc4py is
                                simple:<br>
                                > <br>
                                > viewer =
                                PETSc.Viewer().createBinary(<path/to/existing/matrix>,
                                "r",<br>
                                > comm=PETSc.COMM_WORLD)<br>
                                > A =
                                PETSc.Mat().create(comm=PETSc.COMM_WORLD)<br>
                                > A.load(viewer)<br>
                                > <br>
                                > When I run this code in serial, the
                                memory consumption of the process is<br>
                                > about 50GB RAM, similar to the file
                                size of the saved matrix. However,<br>
                                > if I run the code in parallel, for
                                a few seconds the memory consumption<br>
                                > of the process doubles to around
                                100GB RAM, before dropping back down to<br>
                                > around 50GB RAM. So it seems as if,
                                for some reason, the matrix is<br>
                                > copied after it is read into
                                memory. Is there a way to avoid this<br>
                                > behaviour? Currently, it is a clear
                                bottleneck in my code.<br>
                                > <br>
                                > I tried setting the size of the
                                matrix and to explicitly preallocate the<br>
                                > necessary NNZ (with A.setSizes(dim)
                                and A.setPreallocationNNZ(nnz),<br>
                                > respectively) before loading, but
                                that didn't help.<br>
                                > <br>
                                > As mentioned above, I'm using
                                petsc4py together with PETSc-3.16 on a<br>
                                > Linux workstation.<br>
                                > <br>
                                > Best regards,<br>
                                > Michael Werner<br>
                                > <br>
                                > -- <br>
                                > <br>
                                >
                                ____________________________________________________<br>
                                > <br>
                                > Deutsches Zentrum für Luft- und
                                Raumfahrt e.V. (DLR)<br>
                                > Institut für Aerodynamik und
                                Strömungstechnik | Bunsenstr. 10 | 37073
                                Göttingen<br>
                                > <br>
                                > Michael Werner <br>
                                > Telefon 0551 709-2627 | Telefax
                                0551 709-2811 | <a
                                  href="mailto:Michael.Werner@dlr.de"
                                  target="_blank" moz-do-not-send="true">Michael.Werner@dlr.de</a><br>
                                > <a href="http://DLR.de"
                                  target="_blank" moz-do-not-send="true">DLR.de</a><br>
                                > <br>
                                > <br>
                                > <br>
                                > <br>
                                > <br>
                                > <br>
                                > <br>
                                > <br>
                                > <br>
                                <br>
                              </blockquote>
                            </div>
                            <br clear="all">
                            <div><br>
                            </div>
                            -- <br>
                            <div dir="ltr">
                              <div dir="ltr">
                                <div>
                                  <div dir="ltr">
                                    <div>
                                      <div dir="ltr">
                                        <div>What most experimenters
                                          take for granted before they
                                          begin their experiments is
                                          infinitely more interesting
                                          than any results to which
                                          their experiments lead.<br>
                                          -- Norbert Wiener</div>
                                        <div><br>
                                        </div>
                                        <div><a
                                            href="http://www.cse.buffalo.edu/~knepley/"
                                            target="_blank"
                                            moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <br>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <br>
              </blockquote>
              <br>
            </div>
          </blockquote>
        </div>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr" class="gmail_signature">
          <div dir="ltr">
            <div>
              <div dir="ltr">
                <div>
                  <div dir="ltr">
                    <div>What most experimenters take for granted before
                      they begin their experiments is infinitely more
                      interesting than any results to which their
                      experiments lead.<br>
                      -- Norbert Wiener</div>
                    <div><br>
                    </div>
                    <div><a href="http://www.cse.buffalo.edu/~knepley/"
                        target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>