<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear colleagues,</p>
<p>Thank you much for the help!</p>
<p>Now the code seems to be working well!<br>
</p>
Best,<br>
Lidiia<br>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 03.06.2022 15:19, Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAMYG4Gm+BCLxfyL4Q22zSgsxLbZvWbu7LiaQqui8sNhU4rofOg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Fri, Jun 3, 2022 at 6:42 AM Lidia <<a
href="mailto:lidia.varsh@mail.ioffe.ru"
moz-do-not-send="true" class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Dear Matt, Barry,</p>
<p>thank you for the information about openMP!</p>
<p>Now all processes are loaded well. But we see a strange
behaviour of running times at different iterations, see
description below. Could you please explain us the
reason and how we can improve it?<br>
</p>
<p>We need to quickly solve a big (about 1e6 rows) square
sparse non-symmetric matrix many times (about 1e5 times)
consequently. Matrix is constant at every iteration, and
the right-side vector B is slowly changed (we think that
its change at every iteration should be less then 0.001
%). So we use every previous solution vector X as an
initial guess for the next iteration. AMG preconditioner
and GMRES solver are used.<br>
</p>
<p>We have tested the code using a matrix with 631 000
rows, during 15 consequent iterations, using vector X
from the previous iterations. Right-side vector B and
matrix A are constant during the whole running. The time
of the first iteration is large (about 2 seconds) and is
quickly decreased to the next iterations (average time
of last iterations were about 0.00008 s). But some
iterations in the middle (# 2 and # 12) have huge time -
0.999063 second (see the figure with time dynamics
attached). This time of 0.999 second does not depend on
the size of a matrix, on the number of MPI processes,
these time jumps also exist if we vary vector B. Why
these time jumps appear and how we can avoid them?</p>
</div>
</blockquote>
<div><br>
</div>
<div>PETSc is not taking this time. It must come from
somewhere else in your code. Notice that no iterations are
taken for any subsequent solves, so no operations other than
the residual norm check (and preconditioner application) are
being performed.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>The ksp_monitor out for this running (included 15
iterations) using 36 MPI processes and a file with the
memory bandwidth information (testSpeed) are also
attached. We can provide our C++ script if it is needed.<br>
</p>
<p>Thanks a lot!<br>
</p>
Best,<br>
Lidiia<br>
<p><br>
</p>
<p><br>
</p>
<div>On 01.06.2022 21:14, Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">On Wed, Jun 1, 2022 at 1:43 PM Lidia
<<a href="mailto:lidia.varsh@mail.ioffe.ru"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Dear Matt,</p>
<p>Thank you for the rule of 10,000 variables
per process! We have run ex.5 with matrix 1e4
x 1e4 at our cluster and got a good
performance dynamics (see the figure
"performance.png" - dependency of the solving
time in seconds on the number of cores). We
have used GAMG preconditioner (multithread: we
have added the option "<span
style="color:rgb(29,28,29);font-family:Slack-Lato,Slack-Fractions,appleLogo,sans-serif;font-size:15px;font-style:normal;font-variant-ligatures:common-ligatures;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:left;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">-pc_gamg_use_parallel_coarse_grid_solver"</span>)
and GMRES solver. And we have set one openMP
thread to every MPI process. Now the ex.5 is
working good on many mpi processes! But the
running uses about 100 GB of RAM.<br>
</p>
<p>How we can run ex.5 using many openMP threads
without mpi? If we just change the running
command, the cores are not loaded normally:
usually just one core is loaded in 100 % and
others are idle. Sometimes all cores are
working in 100 % during 1 second but then
again become idle about 30 seconds. Can the
preconditioner use many threads and how to
activate this option?</p>
</div>
</blockquote>
<div><br>
</div>
<div>Maye you could describe what you are trying to
accomplish? Threads and processes are not really
different, except for memory sharing. However,
sharing large complex data structures rarely
works. That is why they get partitioned and
operate effectively as distributed memory. You
would not really save memory by using</div>
<div>threads in this instance, if that is your goal.
This is detailed in the talks in this session (see
2016 PP Minisymposium on this page <a
href="https://cse.buffalo.edu/~knepley/relacs.html"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://cse.buffalo.edu/~knepley/relacs.html</a>).</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>The solving times (the time of the solver
work) using 60 openMP threads is 511 seconds
now, and while using 60 MPI processes - 13.19
seconds.</p>
<p>ksp_monitor outs for both cases (many openMP
threads or many MPI processes) are attached.</p>
<p><br>
</p>
<p>Thank you!</p>
Best,<br>
Lidia<br>
<div><br>
</div>
<div>On 31.05.2022 15:21, Matthew Knepley wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I have looked at the local
logs. First, you have run problems of size
12 and 24. As a rule of thumb, you need
10,000
<div>variables per process in order to see
good speedup.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue,
May 31, 2022 at 8:19 AM Matthew Knepley
<<a href="mailto:knepley@gmail.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">knepley@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">On Tue, May 31, 2022 at
7:39 AM Lidia <<a
href="mailto:lidia.varsh@mail.ioffe.ru"
target="_blank"
moz-do-not-send="true"
class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Matt, Mark, thank you much for
your answers!</p>
<p><br>
</p>
<p>Now we have run example # 5 on
our computer cluster and on the
local server and also have not
seen any performance increase,
but by unclear reason running
times on the local server are
much better than on the cluster.</p>
</div>
</blockquote>
<div>I suspect that you are trying to
get speedup without increasing the
memory bandwidth:</div>
<div><br>
</div>
<div> <a
href="https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup"
target="_blank"
moz-do-not-send="true"
class="moz-txt-link-freetext">https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup</a></div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt <br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Now we will try to run petsc #5
example inside a docker
container on our server and see
if the problem is in our
environment. I'll write you the
results of this test as soon as
we get it.</p>
<p>The ksp_monitor outs for the
5th test at the current local
server configuration (for 2 and
4 mpi processes) and for the
cluster (for 1 and 3 mpi
processes) are attached .</p>
<p><br>
</p>
<p>And one more question.
Potentially we can use 10 nodes
and 96 threads at each node on
our cluster. What do you think,
which combination of numbers of
mpi processes and openmp threads
may be the best for the 5th
example?<br>
</p>
<p>Thank you!<br>
</p>
<p><br>
</p>
Best,<br>
Lidiia<br>
<div><br>
</div>
<div>On 31.05.2022 05:42, Mark
Adams wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">And if you see
"NO" change in performance I
suspect the solver/matrix is
all on one processor.
<div>(PETSc does not use
threads by default so
threads should not change
anything).</div>
<div><br>
</div>
<div>As Matt said, it is best
to start with a PETSc
example that does something
like what you want (parallel
linear solve, see
src/ksp/ksp/tutorials for
examples), and then add your
code to it.</div>
<div>That way you get the
basic infrastructure in
place for you, which is
pretty obscure to the
uninitiated.</div>
<div><br>
</div>
<div>Mark</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr"
class="gmail_attr">On Mon,
May 30, 2022 at 10:18 PM
Matthew Knepley <<a
href="mailto:knepley@gmail.com"
target="_blank"
moz-do-not-send="true"
class="moz-txt-link-freetext">knepley@gmail.com</a>>
wrote:<br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">On Mon, May
30, 2022 at 10:12 PM
Lidia <<a
href="mailto:lidia.varsh@mail.ioffe.ru"
target="_blank"
moz-do-not-send="true"
class="moz-txt-link-freetext">lidia.varsh@mail.ioffe.ru</a>> wrote:<br>
</div>
<div class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">Dear
colleagues,<br>
<br>
Is here anyone who
have solved big sparse
linear matrices using
PETSC?<br>
</blockquote>
<div><br>
</div>
<div>There are lots of
publications with this
kind of data. Here is
one recent one: <a
href="https://arxiv.org/abs/2204.01722"
target="_blank"
moz-do-not-send="true"
class="moz-txt-link-freetext">https://arxiv.org/abs/2204.01722</a></div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
We have found NO
performance
improvement while
using more and more
mpi <br>
processes (1-2-3) and
open-mp threads (from
1 to 72 threads). Did
anyone <br>
faced to this problem?
Does anyone know any
possible reasons of
such <br>
behaviour?<br>
</blockquote>
<div><br>
</div>
<div>Solver behavior is
dependent on the input
matrix. The only
general-purpose
solvers</div>
<div>are direct, but
they do not scale
linearly and have high
memory requirements.</div>
<div><br>
</div>
<div>Thus, in order to
make progress you will
have to be specific
about your matrices.</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
We use AMG
preconditioner and
GMRES solver from KSP
package, as our <br>
matrix is large (from
100 000 to 1e+6 rows
and columns), sparse,
<br>
non-symmetric and
includes both positive
and negative values.
But <br>
performance problems
also exist while using
CG solvers with
symmetric <br>
matrices.<br>
</blockquote>
<div><br>
</div>
<div>There are many
PETSc examples, such
as example 5 for the
Laplacian, that
exhibit</div>
<div>good scaling with
both AMG and GMG.</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
Could anyone help us
to set appropriate
options of the
preconditioner <br>
and solver? Now we use
default parameters,
maybe they are not the
best, <br>
but we do not know a
good combination. Or
maybe you could
suggest any <br>
other pairs of
preconditioner+solver
for such tasks?<br>
<br>
I can provide more
information: the
matrices that we
solve, c++ script <br>
to run solving using
petsc and any
statistics obtained by
our runs.<br>
</blockquote>
<div><br>
</div>
<div>First, please
provide a description
of the linear system,
and the output of</div>
<div><br>
</div>
<div> -ksp_view
-ksp_monitor_true_residual
-ksp_converged_reason
-log_view</div>
<div><br>
</div>
<div>for each test case.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0px 0px
0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
Thank you in advance!<br>
<br>
Best regards,<br>
Lidiia Varshavchik,<br>
Ioffe Institute, St.
Petersburg, Russia<br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most
experimenters
take for
granted before
they begin
their
experiments is
infinitely
more
interesting
than any
results to
which their
experiments
lead.<br>
-- Norbert
Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters
take for granted before
they begin their
experiments is infinitely
more interesting than any
results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take
for granted before they begin
their experiments is infinitely
more interesting than any
results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for
granted before they begin their
experiments is infinitely more
interesting than any results to which
their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>