<div dir="ltr"><div dir="ltr">On Wed, Jun 1, 2022 at 1:43 PM Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru">lidia.varsh@mail.ioffe.ru</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>Dear Matt,</p>

    <p>Thank you for the rule of 10,000 variables per process! We have

      run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good

      performance dynamics (see the figure "performance.png" -

      dependency of the solving time in seconds on the number of cores).

      We have used GAMG preconditioner (multithread: we have added the

      option "<span style="color:rgb(29,28,29);font-family:Slack-Lato,Slack-Fractions,appleLogo,sans-serif;font-size:15px;font-style:normal;font-variant-ligatures:common-ligatures;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">-pc_gamg_use_parallel_coarse_grid_solver"</span>)

      and GMRES solver. And we have set one openMP thread to every MPI

      process. Now the ex.5 is working good on many mpi processes! But

      the running uses about 100 GB of RAM.<br>

    </p>

    <p>How we can run ex.5 using many openMP threads without mpi? If we

      just change the running command, the

      cores are not loaded normally: usually just one core is loaded in

      100 % and others are idle. Sometimes all cores are working in 100

      %

      during 1 second but then again become idle about 30 seconds. Can

      the preconditioner use many threads and how to activate this

      option?</p></div></blockquote><div><br></div><div>Maye you could describe what you are trying to accomplish? Threads and processes are not really different, except for memory sharing. However, sharing large complex data structures rarely works. That is why they get partitioned and operate effectively as distributed memory. You would not really save memory by using</div><div>threads in this instance, if that is your goal. This is detailed in the talks in this session (see 2016 PP Minisymposium on this page <a href="https://cse.buffalo.edu/~knepley/relacs.html">https://cse.buffalo.edu/~knepley/relacs.html</a>).</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>

    <p>The solving times (the time of the solver work) using 60 openMP

      threads is 511 seconds now, and while using 60 MPI processes -

      13.19 seconds.</p>

    <p>ksp_monitor outs for both cases (many openMP threads or many MPI

      processes) are attached.</p>

    <p><br>

    </p>

    <p>Thank you!</p>

    Best,<br>

    Lidia<br>

    <div><br>

    </div>

    <div>On 31.05.2022 15:21, Matthew Knepley

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">I have looked at the local logs. First, you have

        run problems of size 12  and 24. As a rule of thumb, you need

        10,000

        <div>variables per process in order to see good speedup.</div>

        <div><br>

        </div>

        <div>  Thanks,</div>

        <div><br>

        </div>

        <div>     Matt</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Tue, May 31, 2022 at 8:19

          AM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div dir="ltr">

            <div dir="ltr">On Tue, May 31, 2022 at 7:39 AM Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru" target="_blank">lidia.varsh@mail.ioffe.ru</a>>

              wrote:<br>

            </div>

            <div class="gmail_quote">

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div>

                  <p>Matt, Mark, thank you much for your answers!</p>

                  <p><br>

                  </p>

                  <p>Now we have run example # 5 on our computer cluster

                    and on the local server and also have not seen any

                    performance increase, but by unclear reason running

                    times on the local server are much better than on

                    the cluster.</p>

                </div>

              </blockquote>

              <div>I suspect that you are trying to get speedup without

                increasing the memory bandwidth:</div>

              <div><br>

              </div>

              <div>  <a href="https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup" target="_blank">https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup</a></div>

              <div><br>

              </div>

              <div>  Thanks,</div>

              <div><br>

              </div>

              <div>     Matt <br>

              </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                <div>

                  <p>Now we will try to run petsc #5 example inside a

                    docker container on our server and see if the

                    problem is in our environment. I'll write you the

                    results of this test as soon as we get it.</p>

                  <p>The ksp_monitor outs for the 5th test at the

                    current local server configuration (for 2 and 4 mpi

                    processes) and for the cluster (for 1 and 3 mpi

                    processes) are attached .</p>

                  <p><br>

                  </p>

                  <p>And one more question. Potentially we can use 10

                    nodes and 96 threads at each node on our cluster.

                    What do you think, which combination of numbers of

                    mpi processes and openmp threads may be the best for

                    the 5th example?<br>

                  </p>

                  <p>Thank you!<br>

                  </p>

                  <p><br>

                  </p>

                  Best,<br>

                  Lidiia<br>

                  <div><br>

                  </div>

                  <div>On 31.05.2022 05:42, Mark Adams wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">And if you see "NO" change in

                      performance I suspect the solver/matrix is all on

                      one processor.

                      <div>(PETSc does not use threads by default so

                        threads should not change anything).</div>

                      <div><br>

                      </div>

                      <div>As Matt said, it is best to start with a

                        PETSc example that does something like what you

                        want (parallel linear solve, see

                        src/ksp/ksp/tutorials for examples), and then

                        add your code to it.</div>

                      <div>That way you get the basic infrastructure in

                        place for you, which is pretty obscure to the

                        uninitiated.</div>

                      <div><br>

                      </div>

                      <div>Mark</div>

                    </div>

                    <br>

                    <div class="gmail_quote">

                      <div dir="ltr" class="gmail_attr">On Mon, May 30,

                        2022 at 10:18 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>

                        wrote:<br>

                      </div>

                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                        <div dir="ltr">

                          <div dir="ltr">On Mon, May 30, 2022 at 10:12

                            PM Lidia <<a href="mailto:lidia.varsh@mail.ioffe.ru" target="_blank">lidia.varsh@mail.ioffe.ru</a>>

                            wrote:<br>

                          </div>

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear

                              colleagues,<br>

                              <br>

                              Is here anyone who have solved big sparse

                              linear matrices using PETSC?<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>There are lots of publications with

                              this kind of data. Here is one recent

                              one: <a href="https://arxiv.org/abs/2204.01722" target="_blank">https://arxiv.org/abs/2204.01722</a></div>

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> We

                              have found NO performance improvement

                              while using more and more mpi <br>

                              processes (1-2-3) and open-mp threads

                              (from 1 to 72 threads). Did anyone <br>

                              faced to this problem? Does anyone know

                              any possible reasons of such <br>

                              behaviour?<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Solver behavior is dependent on the

                              input matrix. The only general-purpose

                              solvers</div>

                            <div>are direct, but they do not scale

                              linearly and have high memory

                              requirements.</div>

                            <div><br>

                            </div>

                            <div>Thus, in order to make progress you

                              will have to be specific about your

                              matrices.</div>

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> We use

                              AMG preconditioner and GMRES solver from

                              KSP package, as our <br>

                              matrix is large (from 100 000 to 1e+6 rows

                              and columns), sparse, <br>

                              non-symmetric and includes both positive

                              and negative values. But <br>

                              performance problems also exist while

                              using CG solvers with symmetric <br>

                              matrices.<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>There are many PETSc examples, such as

                              example 5 for the Laplacian, that exhibit</div>

                            <div>good scaling with both AMG and GMG.</div>

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Could

                              anyone help us to set appropriate options

                              of the preconditioner <br>

                              and solver? Now we use default parameters,

                              maybe they are not the best, <br>

                              but we do not know a good combination. Or

                              maybe you could suggest any <br>

                              other pairs of preconditioner+solver for

                              such tasks?<br>

                              <br>

                              I can provide more information: the

                              matrices that we solve, c++ script <br>

                              to run solving using petsc and any

                              statistics obtained by our runs.<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>First, please provide a description of

                              the linear system, and the output of</div>

                            <div><br>

                            </div>

                            <div>  -ksp_view -ksp_monitor_true_residual

                              -ksp_converged_reason -log_view</div>

                            <div><br>

                            </div>

                            <div>for each test case.</div>

                            <div><br>

                            </div>

                            <div>  Thanks,</div>

                            <div><br>

                            </div>

                            <div>     Matt</div>

                            <div> </div>

                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Thank

                              you in advance!<br>

                              <br>

                              Best regards,<br>

                              Lidiia Varshavchik,<br>

                              Ioffe Institute, St. Petersburg, Russia<br>

                            </blockquote>

                          </div>

                          <br clear="all">

                          <div><br>

                          </div>

                          -- <br>

                          <div dir="ltr">

                            <div dir="ltr">

                              <div>

                                <div dir="ltr">

                                  <div>

                                    <div dir="ltr">

                                      <div>What most experimenters take

                                        for granted before they begin

                                        their experiments is infinitely

                                        more interesting than any

                                        results to which their

                                        experiments lead.<br>

                                        -- Norbert Wiener</div>

                                      <div><br>

                                      </div>

                                      <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </blockquote>

            </div>

            <br clear="all">

            <div><br>

            </div>

            -- <br>

            <div dir="ltr">

              <div dir="ltr">

                <div>

                  <div dir="ltr">

                    <div>

                      <div dir="ltr">

                        <div>What most experimenters take for granted

                          before they begin their experiments is

                          infinitely more interesting than any results

                          to which their experiments lead.<br>

                          -- Norbert Wiener</div>

                        <div><br>

                        </div>

                        <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </blockquote>

      </div>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      <div dir="ltr">

        <div dir="ltr">

          <div>

            <div dir="ltr">

              <div>

                <div dir="ltr">

                  <div>What most experimenters take for granted before

                    they begin their experiments is infinitely more

                    interesting than any results to which their

                    experiments lead.<br>

                    -- Norbert Wiener</div>

                  <div><br>

                  </div>

                  <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>