<div dir="ltr">Your timing data in the first plot seems to have random integers (2,1,1) added to random iterations (0,2,12).<div>Perhaps there is a bug in your test setup?</div><div><br></div><div>Mark</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jun 3, 2022 at 6:42 AM Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru">lidia.varsh@mail.ioffe.ru</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p>Dear Matt, Barry,</p>
    <p>thank you for the information about openMP!</p>
    <p>Now all processes are loaded well. But we see a strange behaviour
      of running times at different iterations, see description below.
      Could you please explain us the reason and how we can improve it?<br>
    </p>
    <p>We need to quickly solve a big (about 1e6 rows) square sparse
      non-symmetric matrix many times (about 1e5 times) consequently.
      Matrix is constant at every iteration, and the right-side vector B
      is slowly changed (we think that its change at every iteration
      should be less then 0.001 %). So we use every previous solution
      vector X as an initial guess for the next iteration. AMG
      preconditioner and GMRES solver are used.<br>
    </p>
    <p>We have tested the code using a matrix with 631 000 rows, during
      15 consequent iterations, using vector X from the previous
      iterations. Right-side vector B and matrix A are constant during
      the whole running. The time of the first iteration is large (about
      2 seconds) and is quickly decreased to the next iterations
      (average time of last iterations were about 0.00008 s). But some
      iterations in the middle (# 2 and # 12) have huge time - 0.999063
      second (see the figure with time dynamics attached). This time of
      0.999 second does not depend on the size of a matrix, on the
      number of MPI processes, these time jumps also exist if we vary
      vector B. Why these time jumps appear and how we can avoid them?</p>
    <p>The ksp_monitor out for this running (included 15 iterations)
      using 36 MPI processes and a file with the memory bandwidth
      information (testSpeed) are also attached. We can provide our C++
      script if it is needed.<br>
    </p>
    <p>Thanks a lot!<br>
    </p>
    Best,<br>
    Lidiia<br>
    <p><br>
    </p>
    <p><br>
    </p>
    <div>On 01.06.2022 21:14, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">On Wed, Jun 1, 2022 at 1:43 PM Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru" target="_blank">lidia.varsh@mail.ioffe.ru</a>>
          wrote:<br>
        </div>
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p>Dear Matt,</p>
              <p>Thank you for the rule of 10,000 variables per process!
                We have run ex.5 with matrix 1e4 x 1e4 at our cluster
                and got a good performance dynamics (see the figure
                "performance.png" - dependency of the solving time in
                seconds on the number of cores). We have used GAMG
                preconditioner (multithread: we have added the option "<span style="color:rgb(29,28,29);font-family:Slack-Lato,Slack-Fractions,appleLogo,sans-serif;font-size:15px;font-style:normal;font-variant-ligatures:common-ligatures;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">-pc_gamg_use_parallel_coarse_grid_solver"</span>)
                and GMRES solver. And we have set one openMP thread to
                every MPI process. Now the ex.5 is working good on many
                mpi processes! But the running uses about 100 GB of RAM.<br>
              </p>
              <p>How we can run ex.5 using many openMP threads without
                mpi? If we just change the running command, the cores
                are not loaded normally: usually just one core is loaded
                in 100 % and others are idle. Sometimes all cores are
                working in 100 % during 1 second but then again become
                idle about 30 seconds. Can the preconditioner use many
                threads and how to activate this option?</p>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Maye you could describe what you are trying to
            accomplish? Threads and processes are not really different,
            except for memory sharing. However, sharing large complex
            data structures rarely works. That is why they get
            partitioned and operate effectively as distributed memory.
            You would not really save memory by using</div>
          <div>threads in this instance, if that is your goal. This is
            detailed in the talks in this session (see 2016 PP
            Minisymposium on this page <a href="https://cse.buffalo.edu/~knepley/relacs.html" target="_blank">https://cse.buffalo.edu/~knepley/relacs.html</a>).</div>
          <div><br>
          </div>
          <div>  Thanks,</div>
          <div><br>
          </div>
          <div>     Matt</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p>The solving times (the time of the solver work) using
                60 openMP threads is 511 seconds now, and while using 60
                MPI processes - 13.19 seconds.</p>
              <p>ksp_monitor outs for both cases (many openMP threads or
                many MPI processes) are attached.</p>
              <p><br>
              </p>
              <p>Thank you!</p>
              Best,<br>
              Lidia<br>
              <div><br>
              </div>
              <div>On 31.05.2022 15:21, Matthew Knepley wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">I have looked at the local logs. First,
                  you have run problems of size 12  and 24. As a rule of
                  thumb, you need 10,000
                  <div>variables per process in order to see good
                    speedup.</div>
                  <div><br>
                  </div>
                  <div>  Thanks,</div>
                  <div><br>
                  </div>
                  <div>     Matt</div>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Tue, May 31, 2022
                    at 8:19 AM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div dir="ltr">
                      <div dir="ltr">On Tue, May 31, 2022 at 7:39 AM
                        Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru" target="_blank">lidia.varsh@mail.ioffe.ru</a>>
                        wrote:<br>
                      </div>
                      <div class="gmail_quote">
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div>
                            <p>Matt, Mark, thank you much for your
                              answers!</p>
                            <p><br>
                            </p>
                            <p>Now we have run example # 5 on our
                              computer cluster and on the local server
                              and also have not seen any performance
                              increase, but by unclear reason running
                              times on the local server are much better
                              than on the cluster.</p>
                          </div>
                        </blockquote>
                        <div>I suspect that you are trying to get
                          speedup without increasing the memory
                          bandwidth:</div>
                        <div><br>
                        </div>
                        <div>  <a href="https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup" target="_blank">https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup</a></div>
                        <div><br>
                        </div>
                        <div>  Thanks,</div>
                        <div><br>
                        </div>
                        <div>     Matt <br>
                        </div>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                          <div>
                            <p>Now we will try to run petsc #5 example
                              inside a docker container on our server
                              and see if the problem is in our
                              environment. I'll write you the results of
                              this test as soon as we get it.</p>
                            <p>The ksp_monitor outs for the 5th test at
                              the current local server configuration
                              (for 2 and 4 mpi processes) and for the
                              cluster (for 1 and 3 mpi processes) are
                              attached .</p>
                            <p><br>
                            </p>
                            <p>And one more question. Potentially we can
                              use 10 nodes and 96 threads at each node
                              on our cluster. What do you think, which
                              combination of numbers of mpi processes
                              and openmp threads may be the best for the
                              5th example?<br>
                            </p>
                            <p>Thank you!<br>
                            </p>
                            <p><br>
                            </p>
                            Best,<br>
                            Lidiia<br>
                            <div><br>
                            </div>
                            <div>On 31.05.2022 05:42, Mark Adams wrote:<br>
                            </div>
                            <blockquote type="cite">
                              <div dir="ltr">And if you see "NO" change
                                in performance I suspect the
                                solver/matrix is all on one processor.
                                <div>(PETSc does not use threads by
                                  default so threads should not change
                                  anything).</div>
                                <div><br>
                                </div>
                                <div>As Matt said, it is best to start
                                  with a PETSc example that does
                                  something like what you want (parallel
                                  linear solve, see
                                  src/ksp/ksp/tutorials for examples),
                                  and then add your code to it.</div>
                                <div>That way you get the basic
                                  infrastructure in place for you, which
                                  is pretty obscure to the uninitiated.</div>
                                <div><br>
                                </div>
                                <div>Mark</div>
                              </div>
                              <br>
                              <div class="gmail_quote">
                                <div dir="ltr" class="gmail_attr">On
                                  Mon, May 30, 2022 at 10:18 PM Matthew
                                  Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>
                                  wrote:<br>
                                </div>
                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                  <div dir="ltr">
                                    <div dir="ltr">On Mon, May 30, 2022
                                      at 10:12 PM Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru" target="_blank">lidia.varsh@mail.ioffe.ru</a>>
                                      wrote:<br>
                                    </div>
                                    <div class="gmail_quote">
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear
                                        colleagues,<br>
                                        <br>
                                        Is here anyone who have solved
                                        big sparse linear matrices using
                                        PETSC?<br>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>There are lots of
                                        publications with this kind of
                                        data. Here is one recent one: <a href="https://arxiv.org/abs/2204.01722" target="_blank">https://arxiv.org/abs/2204.01722</a></div>
                                      <div> </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        We have found NO performance
                                        improvement while using more and
                                        more mpi <br>
                                        processes (1-2-3) and open-mp
                                        threads (from 1 to 72 threads).
                                        Did anyone <br>
                                        faced to this problem? Does
                                        anyone know any possible reasons
                                        of such <br>
                                        behaviour?<br>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>Solver behavior is dependent
                                        on the input matrix. The only
                                        general-purpose solvers</div>
                                      <div>are direct, but they do not
                                        scale linearly and have high
                                        memory requirements.</div>
                                      <div><br>
                                      </div>
                                      <div>Thus, in order to make
                                        progress you will have to be
                                        specific about your matrices.</div>
                                      <div> </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        We use AMG preconditioner and
                                        GMRES solver from KSP package,
                                        as our <br>
                                        matrix is large (from 100 000 to
                                        1e+6 rows and columns), sparse,
                                        <br>
                                        non-symmetric and includes both
                                        positive and negative values.
                                        But <br>
                                        performance problems also exist
                                        while using CG solvers with
                                        symmetric <br>
                                        matrices.<br>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>There are many PETSc
                                        examples, such as example 5 for
                                        the Laplacian, that exhibit</div>
                                      <div>good scaling with both AMG
                                        and GMG.</div>
                                      <div> </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        Could anyone help us to set
                                        appropriate options of the
                                        preconditioner <br>
                                        and solver? Now we use default
                                        parameters, maybe they are not
                                        the best, <br>
                                        but we do not know a good
                                        combination. Or maybe you could
                                        suggest any <br>
                                        other pairs of
                                        preconditioner+solver for such
                                        tasks?<br>
                                        <br>
                                        I can provide more information:
                                        the matrices that we solve, c++
                                        script <br>
                                        to run solving using petsc and
                                        any statistics obtained by our
                                        runs.<br>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>First, please provide a
                                        description of the linear
                                        system, and the output of</div>
                                      <div><br>
                                      </div>
                                      <div>  -ksp_view
                                        -ksp_monitor_true_residual
                                        -ksp_converged_reason -log_view</div>
                                      <div><br>
                                      </div>
                                      <div>for each test case.</div>
                                      <div><br>
                                      </div>
                                      <div>  Thanks,</div>
                                      <div><br>
                                      </div>
                                      <div>     Matt</div>
                                      <div> </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        Thank you in advance!<br>
                                        <br>
                                        Best regards,<br>
                                        Lidiia Varshavchik,<br>
                                        Ioffe Institute, St. Petersburg,
                                        Russia<br>
                                      </blockquote>
                                    </div>
                                    <br clear="all">
                                    <div><br>
                                    </div>
                                    -- <br>
                                    <div dir="ltr">
                                      <div dir="ltr">
                                        <div>
                                          <div dir="ltr">
                                            <div>
                                              <div dir="ltr">
                                                <div>What most
                                                  experimenters take for
                                                  granted before they
                                                  begin their
                                                  experiments is
                                                  infinitely more
                                                  interesting than any
                                                  results to which their
                                                  experiments lead.<br>
                                                  -- Norbert Wiener</div>
                                                <div><br>
                                                </div>
                                                <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                            </blockquote>
                          </div>
                        </blockquote>
                      </div>
                      <br clear="all">
                      <div><br>
                      </div>
                      -- <br>
                      <div dir="ltr">
                        <div dir="ltr">
                          <div>
                            <div dir="ltr">
                              <div>
                                <div dir="ltr">
                                  <div>What most experimenters take for
                                    granted before they begin their
                                    experiments is infinitely more
                                    interesting than any results to
                                    which their experiments lead.<br>
                                    -- Norbert Wiener</div>
                                  <div><br>
                                  </div>
                                  <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <br clear="all">
                <div><br>
                </div>
                -- <br>
                <div dir="ltr">
                  <div dir="ltr">
                    <div>
                      <div dir="ltr">
                        <div>
                          <div dir="ltr">
                            <div>What most experimenters take for
                              granted before they begin their
                              experiments is infinitely more interesting
                              than any results to which their
                              experiments lead.<br>
                              -- Norbert Wiener</div>
                            <div><br>
                            </div>
                            <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr">
          <div dir="ltr">
            <div>
              <div dir="ltr">
                <div>
                  <div dir="ltr">
                    <div>What most experimenters take for granted before
                      they begin their experiments is infinitely more
                      interesting than any results to which their
                      experiments lead.<br>
                      -- Norbert Wiener</div>
                    <div><br>
                    </div>
                    <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
  </div>

</blockquote></div>