<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear Matt, Barry,</p>

    <p>thank you for the information about openMP!</p>

    <p>Now all processes are loaded well. But we see a strange behaviour

      of running times at different iterations, see description below.

      Could you please explain us the reason and how we can improve it?<br>

    </p>

    <p>We need to quickly solve a big (about 1e6 rows) square sparse

      non-symmetric matrix many times (about 1e5 times) consequently.

      Matrix is constant at every iteration, and the right-side vector B

      is slowly changed (we think that its change at every iteration

      should be less then 0.001 %). So we use every previous solution

      vector X as an initial guess for the next iteration. AMG

      preconditioner and GMRES solver are used.<br>

    </p>

    <p>We have tested the code using a matrix with 631 000 rows, during

      15 consequent iterations, using vector X from the previous

      iterations. Right-side vector B and matrix A are constant during

      the whole running. The time of the first iteration is large (about

      2 seconds) and is quickly decreased to the next iterations

      (average time of last iterations were about 0.00008 s). But some

      iterations in the middle (# 2 and # 12) have huge time - 0.999063

      second (see the figure with time dynamics attached). This time of

      0.999 second does not depend on the size of a matrix, on the

      number of MPI processes, these time jumps also exist if we vary

      vector B. Why these time jumps appear and how we can avoid them?</p>

    <p>The ksp_monitor out for this running (included 15 iterations)

      using 36 MPI processes and a file with the memory bandwidth

      information (testSpeed) are also attached. We can provide our C++

      script if it is needed.<br>

    </p>

    <p>Thanks a lot!<br>

    </p>

    Best,<br>

    Lidiia<br>

    <p><br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 01.06.2022 21:14, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAMYG4G=mrfv=sm9Ux5kvKZ9XvoWn4K-Ubm-N3mc3pUfFdQt5_Q@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">On Wed, Jun 1, 2022 at 1:43 PM Lidia <<a

            href="mailto:lidia.varsh@mail.ioffe.ru"

            moz-do-not-send="true" class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>>

          wrote:<br>

        </div>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p>Dear Matt,</p>

              <p>Thank you for the rule of 10,000 variables per process!

                We have run ex.5 with matrix 1e4 x 1e4 at our cluster

                and got a good performance dynamics (see the figure

                "performance.png" - dependency of the solving time in

                seconds on the number of cores). We have used GAMG

                preconditioner (multithread: we have added the option "<span

style="color:rgb(29,28,29);font-family:Slack-Lato,Slack-Fractions,appleLogo,sans-serif;font-size:15px;font-style:normal;font-variant-ligatures:common-ligatures;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">-pc_gamg_use_parallel_coarse_grid_solver"</span>)

                and GMRES solver. And we have set one openMP thread to

                every MPI process. Now the ex.5 is working good on many

                mpi processes! But the running uses about 100 GB of RAM.<br>

              </p>

              <p>How we can run ex.5 using many openMP threads without

                mpi? If we just change the running command, the cores

                are not loaded normally: usually just one core is loaded

                in 100 % and others are idle. Sometimes all cores are

                working in 100 % during 1 second but then again become

                idle about 30 seconds. Can the preconditioner use many

                threads and how to activate this option?</p>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div>Maye you could describe what you are trying to

            accomplish? Threads and processes are not really different,

            except for memory sharing. However, sharing large complex

            data structures rarely works. That is why they get

            partitioned and operate effectively as distributed memory.

            You would not really save memory by using</div>

          <div>threads in this instance, if that is your goal. This is

            detailed in the talks in this session (see 2016 PP

            Minisymposium on this page <a

              href="https://cse.buffalo.edu/~knepley/relacs.html"

              moz-do-not-send="true" class="moz-txt-link-freetext">https://cse.buffalo.edu/~knepley/relacs.html</a>).</div>

          <div><br>

          </div>

          <div>  Thanks,</div>

          <div><br>

          </div>

          <div>     Matt</div>

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p>The solving times (the time of the solver work) using

                60 openMP threads is 511 seconds now, and while using 60

                MPI processes - 13.19 seconds.</p>

              <p>ksp_monitor outs for both cases (many openMP threads or

                many MPI processes) are attached.</p>

              <p><br>

              </p>

              <p>Thank you!</p>

              Best,<br>

              Lidia<br>

              <div><br>

              </div>

              <div>On 31.05.2022 15:21, Matthew Knepley wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">I have looked at the local logs. First,

                  you have run problems of size 12  and 24. As a rule of

                  thumb, you need 10,000

                  <div>variables per process in order to see good

                    speedup.</div>

                  <div><br>

                  </div>

                  <div>  Thanks,</div>

                  <div><br>

                  </div>

                  <div>     Matt</div>

                </div>

                <br>

                <div class="gmail_quote">

                  <div dir="ltr" class="gmail_attr">On Tue, May 31, 2022

                    at 8:19 AM Matthew Knepley <<a

                      href="mailto:knepley@gmail.com" target="_blank"

                      moz-do-not-send="true"

                      class="moz-txt-link-freetext">knepley@gmail.com</a>>

                    wrote:<br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div dir="ltr">

                      <div dir="ltr">On Tue, May 31, 2022 at 7:39 AM

                        Lidia <<a

                          href="mailto:lidia.varsh@mail.ioffe.ru"

                          target="_blank" moz-do-not-send="true"

                          class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>>

                        wrote:<br>

                      </div>

                      <div class="gmail_quote">

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

                          0.8ex;border-left:1px solid

                          rgb(204,204,204);padding-left:1ex">

                          <div>

                            <p>Matt, Mark, thank you much for your

                              answers!</p>

                            <p><br>

                            </p>

                            <p>Now we have run example # 5 on our

                              computer cluster and on the local server

                              and also have not seen any performance

                              increase, but by unclear reason running

                              times on the local server are much better

                              than on the cluster.</p>

                          </div>

                        </blockquote>

                        <div>I suspect that you are trying to get

                          speedup without increasing the memory

                          bandwidth:</div>

                        <div><br>

                        </div>

                        <div>  <a

href="https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup"

                            target="_blank" moz-do-not-send="true"

                            class="moz-txt-link-freetext">https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup</a></div>

                        <div><br>

                        </div>

                        <div>  Thanks,</div>

                        <div><br>

                        </div>

                        <div>     Matt <br>

                        </div>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

                          0.8ex;border-left:1px solid

                          rgb(204,204,204);padding-left:1ex">

                          <div>

                            <p>Now we will try to run petsc #5 example

                              inside a docker container on our server

                              and see if the problem is in our

                              environment. I'll write you the results of

                              this test as soon as we get it.</p>

                            <p>The ksp_monitor outs for the 5th test at

                              the current local server configuration

                              (for 2 and 4 mpi processes) and for the

                              cluster (for 1 and 3 mpi processes) are

                              attached .</p>

                            <p><br>

                            </p>

                            <p>And one more question. Potentially we can

                              use 10 nodes and 96 threads at each node

                              on our cluster. What do you think, which

                              combination of numbers of mpi processes

                              and openmp threads may be the best for the

                              5th example?<br>

                            </p>

                            <p>Thank you!<br>

                            </p>

                            <p><br>

                            </p>

                            Best,<br>

                            Lidiia<br>

                            <div><br>

                            </div>

                            <div>On 31.05.2022 05:42, Mark Adams wrote:<br>

                            </div>

                            <blockquote type="cite">

                              <div dir="ltr">And if you see "NO" change

                                in performance I suspect the

                                solver/matrix is all on one processor.

                                <div>(PETSc does not use threads by

                                  default so threads should not change

                                  anything).</div>

                                <div><br>

                                </div>

                                <div>As Matt said, it is best to start

                                  with a PETSc example that does

                                  something like what you want (parallel

                                  linear solve, see

                                  src/ksp/ksp/tutorials for examples),

                                  and then add your code to it.</div>

                                <div>That way you get the basic

                                  infrastructure in place for you, which

                                  is pretty obscure to the uninitiated.</div>

                                <div><br>

                                </div>

                                <div>Mark</div>

                              </div>

                              <br>

                              <div class="gmail_quote">

                                <div dir="ltr" class="gmail_attr">On

                                  Mon, May 30, 2022 at 10:18 PM Matthew

                                  Knepley <<a

                                    href="mailto:knepley@gmail.com"

                                    target="_blank"

                                    moz-do-not-send="true"

                                    class="moz-txt-link-freetext">knepley@gmail.com</a>>

                                  wrote:<br>

                                </div>

                                <blockquote class="gmail_quote"

                                  style="margin:0px 0px 0px

                                  0.8ex;border-left:1px solid

                                  rgb(204,204,204);padding-left:1ex">

                                  <div dir="ltr">

                                    <div dir="ltr">On Mon, May 30, 2022

                                      at 10:12 PM Lidia <<a

                                        href="mailto:lidia.varsh@mail.ioffe.ru"

                                        target="_blank"

                                        moz-do-not-send="true"

                                        class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>>

                                      wrote:<br>

                                    </div>

                                    <div class="gmail_quote">

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">Dear

                                        colleagues,<br>

                                        <br>

                                        Is here anyone who have solved

                                        big sparse linear matrices using

                                        PETSC?<br>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>There are lots of

                                        publications with this kind of

                                        data. Here is one recent one: <a

href="https://arxiv.org/abs/2204.01722" target="_blank"

                                          moz-do-not-send="true"

                                          class="moz-txt-link-freetext">https://arxiv.org/abs/2204.01722</a></div>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        We have found NO performance

                                        improvement while using more and

                                        more mpi <br>

                                        processes (1-2-3) and open-mp

                                        threads (from 1 to 72 threads).

                                        Did anyone <br>

                                        faced to this problem? Does

                                        anyone know any possible reasons

                                        of such <br>

                                        behaviour?<br>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>Solver behavior is dependent

                                        on the input matrix. The only

                                        general-purpose solvers</div>

                                      <div>are direct, but they do not

                                        scale linearly and have high

                                        memory requirements.</div>

                                      <div><br>

                                      </div>

                                      <div>Thus, in order to make

                                        progress you will have to be

                                        specific about your matrices.</div>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        We use AMG preconditioner and

                                        GMRES solver from KSP package,

                                        as our <br>

                                        matrix is large (from 100 000 to

                                        1e+6 rows and columns), sparse,

                                        <br>

                                        non-symmetric and includes both

                                        positive and negative values.

                                        But <br>

                                        performance problems also exist

                                        while using CG solvers with

                                        symmetric <br>

                                        matrices.<br>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>There are many PETSc

                                        examples, such as example 5 for

                                        the Laplacian, that exhibit</div>

                                      <div>good scaling with both AMG

                                        and GMG.</div>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        Could anyone help us to set

                                        appropriate options of the

                                        preconditioner <br>

                                        and solver? Now we use default

                                        parameters, maybe they are not

                                        the best, <br>

                                        but we do not know a good

                                        combination. Or maybe you could

                                        suggest any <br>

                                        other pairs of

                                        preconditioner+solver for such

                                        tasks?<br>

                                        <br>

                                        I can provide more information:

                                        the matrices that we solve, c++

                                        script <br>

                                        to run solving using petsc and

                                        any statistics obtained by our

                                        runs.<br>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>First, please provide a

                                        description of the linear

                                        system, and the output of</div>

                                      <div><br>

                                      </div>

                                      <div>  -ksp_view

                                        -ksp_monitor_true_residual

                                        -ksp_converged_reason -log_view</div>

                                      <div><br>

                                      </div>

                                      <div>for each test case.</div>

                                      <div><br>

                                      </div>

                                      <div>  Thanks,</div>

                                      <div><br>

                                      </div>

                                      <div>     Matt</div>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        Thank you in advance!<br>

                                        <br>

                                        Best regards,<br>

                                        Lidiia Varshavchik,<br>

                                        Ioffe Institute, St. Petersburg,

                                        Russia<br>

                                      </blockquote>

                                    </div>

                                    <br clear="all">

                                    <div><br>

                                    </div>

                                    -- <br>

                                    <div dir="ltr">

                                      <div dir="ltr">

                                        <div>

                                          <div dir="ltr">

                                            <div>

                                              <div dir="ltr">

                                                <div>What most

                                                  experimenters take for

                                                  granted before they

                                                  begin their

                                                  experiments is

                                                  infinitely more

                                                  interesting than any

                                                  results to which their

                                                  experiments lead.<br>

                                                  -- Norbert Wiener</div>

                                                <div><br>

                                                </div>

                                                <div><a

                                                    href="http://www.cse.buffalo.edu/~knepley/"

                                                    target="_blank"

                                                    moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>

                                                </div>

                                              </div>

                                            </div>

                                          </div>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                              </div>

                            </blockquote>

                          </div>

                        </blockquote>

                      </div>

                      <br clear="all">

                      <div><br>

                      </div>

                      -- <br>

                      <div dir="ltr">

                        <div dir="ltr">

                          <div>

                            <div dir="ltr">

                              <div>

                                <div dir="ltr">

                                  <div>What most experimenters take for

                                    granted before they begin their

                                    experiments is infinitely more

                                    interesting than any results to

                                    which their experiments lead.<br>

                                    -- Norbert Wiener</div>

                                  <div><br>

                                  </div>

                                  <div><a

                                      href="http://www.cse.buffalo.edu/~knepley/"

                                      target="_blank"

                                      moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                <br clear="all">

                <div><br>

                </div>

                -- <br>

                <div dir="ltr">

                  <div dir="ltr">

                    <div>

                      <div dir="ltr">

                        <div>

                          <div dir="ltr">

                            <div>What most experimenters take for

                              granted before they begin their

                              experiments is infinitely more interesting

                              than any results to which their

                              experiments lead.<br>

                              -- Norbert Wiener</div>

                            <div><br>

                            </div>

                            <div><a

                                href="http://www.cse.buffalo.edu/~knepley/"

                                target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </blockquote>

            </div>

          </blockquote>

        </div>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        <div dir="ltr" class="gmail_signature">

          <div dir="ltr">

            <div>

              <div dir="ltr">

                <div>

                  <div dir="ltr">

                    <div>What most experimenters take for granted before

                      they begin their experiments is infinitely more

                      interesting than any results to which their

                      experiments lead.<br>

                      -- Norbert Wiener</div>

                    <div><br>

                    </div>

                    <div><a href="http://www.cse.buffalo.edu/~knepley/"

                        target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </body>

</html>