<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    I can understand that process 0 needs to have twice its own memory

    due to the process Barry explained. However, in my case every

    process has twice the "necessary" memory. That doesn't seem to be

    correct to me. Especially with Barry's explanation in mind it seems

    strange that all processes have the same peak memory usage. If it

    were only process 0 then it wouldn't matter, because with enough

    processes the overhead would be negligible. <br>

    <br>

    Best regards,<br>

    Michael<br>

    <br>

    <div class="moz-cite-prefix">On 07.10.21 18:32, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAMYG4GkU=9Y=dWJCSJBUL7iZ85kzbXF2WCzq_f3HJe01ZYNpjg@mail.gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">On Thu, Oct 7, 2021 at 11:59 AM Michael Werner

          <<a href="mailto:michael.werner@dlr.de"

            moz-do-not-send="true">michael.werner@dlr.de</a>> wrote:<br>

        </div>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div> Its twice the memory of the entire matrix (when stored

              on one process). I also just sent you the valgrind

              results, both for a serial run and a parallel run. The

              size on disk of the matrix I used is 20 GB. <br>

              In the serial run, valgrind shows a peak memory usage of

              21GB, while in the parallel run (with 4 processes) each

              process shows a peak memory usage of 10.8GB<br>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div>Barry is right that at least proc 0 must have twice its

            own memory, since it loads the other pieces. That makes 10GB

            sounds correct.</div>

          <div><br>

          </div>

          <div>  Thanks,</div>

          <div><br>

          </div>

          <div>     Matt</div>

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div> Best regards,<br>

              Michael<br>

              <br>

              <div>On 07.10.21 17:55, Barry Smith wrote:<br>

              </div>

              <blockquote type="cite"> <br>

                <div><br>

                  <blockquote type="cite">

                    <div>On Oct 7, 2021, at 11:35 AM, Michael Werner

                      <<a href="mailto:michael.werner@dlr.de"

                        target="_blank" moz-do-not-send="true">michael.werner@dlr.de</a>>

                      wrote:</div>

                    <br>

                    <div>

                      <div> Currently I'm using psutil to query every

                        process for its memory usage and sum it up.

                        However, the spike was only visible in top (I

                        had a call to psutil right before and after

                        A.load(viewer), and both reported only 50 GB of

                        RAM usage). That's why I thought it might be

                        directly tied to loading the matrix. However, I

                        also had the problem that the computation

                        crashed due to running out of memory while

                        loading a matrix that should in theory fit into

                        memory. In that case I would expect the OS to

                        free unused meory immediatly, right?<br>

                        <br>

                        Concerning Barry's questions: the matrix is a

                        sparse matrix and is originally created

                        sequentially as SEQAIJ. However, it is then

                        loaded as MPIAIJ, and if I look at the memory

                        usage of the various processes, they fill up one

                        after another, just as described. Is the origin

                        of the matrix somehow preserved in the binary

                        file? I was under the impression that the binary

                        format was agnostic to the number of processes?

                      </div>

                    </div>

                  </blockquote>

                  <div><br>

                  </div>

                   The file format is independent of the number of

                  processes that created it.</div>

                <div><br>

                  <blockquote type="cite">

                    <div>

                      <div>I also varied the number of processes between

                        1 and 60, as soon as I use more than one process

                        I can observe the spike (and its always twice

                        the memory, no matter how many processes I'm

                        using).<br>

                      </div>

                    </div>

                  </blockquote>

                  <div><br>

                  </div>

                    Twice the size of the entire matrix (when stored on

                  one process) or twice the size of the resulting matrix

                  stored on the first rank? The latter is exactly as

                  expected, since rank 0 has to load the part of the

                  matrix destined for the next rank and hence for a

                  short time contains its own part of the matrix and the

                  part of one other rank.</div>

                <div><br>

                </div>

                <div>  Barry</div>

                <div><br>

                  <blockquote type="cite">

                    <div>

                      <div> <br>

                        I also tried running Valgrind with the

                        --tool=massif option. However, I don't know what

                        to look for. I can send you the output file

                        separately, if it helps.<br>

                        <br>

                        Best regards,<br>

                        Michael <br>

                        <br>

                        <div>On 07.10.21 16:09, Matthew Knepley wrote:<br>

                        </div>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div dir="ltr">On Thu, Oct 7, 2021 at 10:03

                              AM Barry Smith <<a

                                href="mailto:bsmith@petsc.dev"

                                target="_blank" moz-do-not-send="true">bsmith@petsc.dev</a>>

                              wrote:<br>

                            </div>

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex"><br>

                                   How many ranks are you using? Is it a

                                sparse matrix with MPIAIJ? <br>

                                <br>

                                   The intention is that for parallel

                                runs the first rank reads in its own

                                part of the matrix, then reads in the

                                part of the next rank and sends it, then

                                reads the part of the third rank and

                                sends it etc. So there should not be too

                                much of a blip in memory usage. You can

                                run valgrind with the option for

                                tracking memory usage to see exactly

                                where in the code the blip occurs; it

                                could be a regression occurred in the

                                code making it require more memory. But

                                internal MPI buffers might explain some

                                blip.<br>

                              </blockquote>

                              <div><br>

                              </div>

                              <div>Is it possible that we free the

                                memory, but the OS has just not given

                                back that memory for use yet? How are

                                you measuring memory usage?</div>

                              <div><br>

                              </div>

                              <div>  Thanks,</div>

                              <div><br>

                              </div>

                              <div>     Matt</div>

                              <div> </div>

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex">  

                                Barry<br>

                                <br>

                                <br>

                                > On Oct 7, 2021, at 9:50 AM, Michael

                                Werner <<a

                                  href="mailto:michael.werner@dlr.de"

                                  target="_blank" moz-do-not-send="true">michael.werner@dlr.de</a>>

                                wrote:<br>

                                > <br>

                                > Hello,<br>

                                > <br>

                                > I noticed that there is a peak in

                                memory consumption when I load an<br>

                                > existing matrix into PETSc. The

                                matrix is previously created by an<br>

                                > external program and saved in the

                                PETSc binary format.<br>

                                > The code I'm using in petsc4py is

                                simple:<br>

                                > <br>

                                > viewer =

                                PETSc.Viewer().createBinary(<path/to/existing/matrix>,

                                "r",<br>

                                > comm=PETSc.COMM_WORLD)<br>

                                > A =

                                PETSc.Mat().create(comm=PETSc.COMM_WORLD)<br>

                                > A.load(viewer)<br>

                                > <br>

                                > When I run this code in serial, the

                                memory consumption of the process is<br>

                                > about 50GB RAM, similar to the file

                                size of the saved matrix. However,<br>

                                > if I run the code in parallel, for

                                a few seconds the memory consumption<br>

                                > of the process doubles to around

                                100GB RAM, before dropping back down to<br>

                                > around 50GB RAM. So it seems as if,

                                for some reason, the matrix is<br>

                                > copied after it is read into

                                memory. Is there a way to avoid this<br>

                                > behaviour? Currently, it is a clear

                                bottleneck in my code.<br>

                                > <br>

                                > I tried setting the size of the

                                matrix and to explicitly preallocate the<br>

                                > necessary NNZ (with A.setSizes(dim)

                                and A.setPreallocationNNZ(nnz),<br>

                                > respectively) before loading, but

                                that didn't help.<br>

                                > <br>

                                > As mentioned above, I'm using

                                petsc4py together with PETSc-3.16 on a<br>

                                > Linux workstation.<br>

                                > <br>

                                > Best regards,<br>

                                > Michael Werner<br>

                                > <br>

                                > -- <br>

                                > <br>

                                >

                                ____________________________________________________<br>

                                > <br>

                                > Deutsches Zentrum für Luft- und

                                Raumfahrt e.V. (DLR)<br>

                                > Institut für Aerodynamik und

                                Strömungstechnik | Bunsenstr. 10 | 37073

                                Göttingen<br>

                                > <br>

                                > Michael Werner <br>

                                > Telefon 0551 709-2627 | Telefax

                                0551 709-2811 | <a

                                  href="mailto:Michael.Werner@dlr.de"

                                  target="_blank" moz-do-not-send="true">Michael.Werner@dlr.de</a><br>

                                > <a href="http://DLR.de"

                                  target="_blank" moz-do-not-send="true">DLR.de</a><br>

                                > <br>

                                > <br>

                                > <br>

                                > <br>

                                > <br>

                                > <br>

                                > <br>

                                > <br>

                                > <br>

                                <br>

                              </blockquote>

                            </div>

                            <br clear="all">

                            <div><br>

                            </div>

                            -- <br>

                            <div dir="ltr">

                              <div dir="ltr">

                                <div>

                                  <div dir="ltr">

                                    <div>

                                      <div dir="ltr">

                                        <div>What most experimenters

                                          take for granted before they

                                          begin their experiments is

                                          infinitely more interesting

                                          than any results to which

                                          their experiments lead.<br>

                                          -- Norbert Wiener</div>

                                        <div><br>

                                        </div>

                                        <div><a

                                            href="http://www.cse.buffalo.edu/~knepley/"

                                            target="_blank"

                                            moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        <br>

                      </div>

                    </div>

                  </blockquote>

                </div>

                <br>

              </blockquote>

              <br>

            </div>

          </blockquote>

        </div>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        <div dir="ltr" class="gmail_signature">

          <div dir="ltr">

            <div>

              <div dir="ltr">

                <div>

                  <div dir="ltr">

                    <div>What most experimenters take for granted before

                      they begin their experiments is infinitely more

                      interesting than any results to which their

                      experiments lead.<br>

                      -- Norbert Wiener</div>

                    <div><br>

                    </div>

                    <div><a href="http://www.cse.buffalo.edu/~knepley/"

                        target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>