[petsc-users] Sparse linear system solving
Lidia
lidia.varsh at mail.ioffe.ru
Mon Jun 6 06:19:37 CDT 2022
Dear colleagues,
Thank you much for the help!
Now the code seems to be working well!
Best,
Lidiia
On 03.06.2022 15:19, Matthew Knepley wrote:
> On Fri, Jun 3, 2022 at 6:42 AM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
>
> Dear Matt, Barry,
>
> thank you for the information about openMP!
>
> Now all processes are loaded well. But we see a strange behaviour
> of running times at different iterations, see description below.
> Could you please explain us the reason and how we can improve it?
>
> We need to quickly solve a big (about 1e6 rows) square sparse
> non-symmetric matrix many times (about 1e5 times) consequently.
> Matrix is constant at every iteration, and the right-side vector B
> is slowly changed (we think that its change at every iteration
> should be less then 0.001 %). So we use every previous solution
> vector X as an initial guess for the next iteration. AMG
> preconditioner and GMRES solver are used.
>
> We have tested the code using a matrix with 631 000 rows, during
> 15 consequent iterations, using vector X from the previous
> iterations. Right-side vector B and matrix A are constant during
> the whole running. The time of the first iteration is large (about
> 2 seconds) and is quickly decreased to the next iterations
> (average time of last iterations were about 0.00008 s). But some
> iterations in the middle (# 2 and # 12) have huge time - 0.999063
> second (see the figure with time dynamics attached). This time of
> 0.999 second does not depend on the size of a matrix, on the
> number of MPI processes, these time jumps also exist if we vary
> vector B. Why these time jumps appear and how we can avoid them?
>
>
> PETSc is not taking this time. It must come from somewhere else in
> your code. Notice that no iterations are taken for any subsequent
> solves, so no operations other than the residual norm check (and
> preconditioner application) are being performed.
>
> Thanks,
>
> Matt
>
> The ksp_monitor out for this running (included 15 iterations)
> using 36 MPI processes and a file with the memory bandwidth
> information (testSpeed) are also attached. We can provide our C++
> script if it is needed.
>
> Thanks a lot!
>
> Best,
> Lidiia
>
>
>
> On 01.06.2022 21:14, Matthew Knepley wrote:
>> On Wed, Jun 1, 2022 at 1:43 PM Lidia <lidia.varsh at mail.ioffe.ru>
>> wrote:
>>
>> Dear Matt,
>>
>> Thank you for the rule of 10,000 variables per process! We
>> have run ex.5 with matrix 1e4 x 1e4 at our cluster and got a
>> good performance dynamics (see the figure "performance.png" -
>> dependency of the solving time in seconds on the number of
>> cores). We have used GAMG preconditioner (multithread: we
>> have added the option
>> "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver.
>> And we have set one openMP thread to every MPI process. Now
>> the ex.5 is working good on many mpi processes! But the
>> running uses about 100 GB of RAM.
>>
>> How we can run ex.5 using many openMP threads without mpi? If
>> we just change the running command, the cores are not loaded
>> normally: usually just one core is loaded in 100 % and others
>> are idle. Sometimes all cores are working in 100 % during 1
>> second but then again become idle about 30 seconds. Can the
>> preconditioner use many threads and how to activate this option?
>>
>>
>> Maye you could describe what you are trying to accomplish?
>> Threads and processes are not really different, except for memory
>> sharing. However, sharing large complex data structures rarely
>> works. That is why they get partitioned and operate effectively
>> as distributed memory. You would not really save memory by using
>> threads in this instance, if that is your goal. This is detailed
>> in the talks in this session (see 2016 PP Minisymposium on this
>> page https://cse.buffalo.edu/~knepley/relacs.html).
>>
>> Thanks,
>>
>> Matt
>>
>> The solving times (the time of the solver work) using 60
>> openMP threads is 511 seconds now, and while using 60 MPI
>> processes - 13.19 seconds.
>>
>> ksp_monitor outs for both cases (many openMP threads or many
>> MPI processes) are attached.
>>
>>
>> Thank you!
>>
>> Best,
>> Lidia
>>
>> On 31.05.2022 15:21, Matthew Knepley wrote:
>>> I have looked at the local logs. First, you have run
>>> problems of size 12 and 24. As a rule of thumb, you need
>>> 10,000
>>> variables per process in order to see good speedup.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley
>>> <knepley at gmail.com> wrote:
>>>
>>> On Tue, May 31, 2022 at 7:39 AM Lidia
>>> <lidia.varsh at mail.ioffe.ru> wrote:
>>>
>>> Matt, Mark, thank you much for your answers!
>>>
>>>
>>> Now we have run example # 5 on our computer cluster
>>> and on the local server and also have not seen any
>>> performance increase, but by unclear reason running
>>> times on the local server are much better than on
>>> the cluster.
>>>
>>> I suspect that you are trying to get speedup without
>>> increasing the memory bandwidth:
>>>
>>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> Now we will try to run petsc #5 example inside a
>>> docker container on our server and see if the
>>> problem is in our environment. I'll write you the
>>> results of this test as soon as we get it.
>>>
>>> The ksp_monitor outs for the 5th test at the current
>>> local server configuration (for 2 and 4 mpi
>>> processes) and for the cluster (for 1 and 3 mpi
>>> processes) are attached .
>>>
>>>
>>> And one more question. Potentially we can use 10
>>> nodes and 96 threads at each node on our cluster.
>>> What do you think, which combination of numbers of
>>> mpi processes and openmp threads may be the best for
>>> the 5th example?
>>>
>>> Thank you!
>>>
>>>
>>> Best,
>>> Lidiia
>>>
>>> On 31.05.2022 05:42, Mark Adams wrote:
>>>> And if you see "NO" change in performance I suspect
>>>> the solver/matrix is all on one processor.
>>>> (PETSc does not use threads by default so threads
>>>> should not change anything).
>>>>
>>>> As Matt said, it is best to start with a PETSc
>>>> example that does something like what you want
>>>> (parallel linear solve, see src/ksp/ksp/tutorials
>>>> for examples), and then add your code to it.
>>>> That way you get the basic infrastructure in place
>>>> for you, which is pretty obscure to the uninitiated.
>>>>
>>>> Mark
>>>>
>>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley
>>>> <knepley at gmail.com> wrote:
>>>>
>>>> On Mon, May 30, 2022 at 10:12 PM Lidia
>>>> <lidia.varsh at mail.ioffe.ru> wrote:
>>>>
>>>> Dear colleagues,
>>>>
>>>> Is here anyone who have solved big sparse
>>>> linear matrices using PETSC?
>>>>
>>>>
>>>> There are lots of publications with this kind
>>>> of data. Here is one recent one:
>>>> https://arxiv.org/abs/2204.01722
>>>>
>>>> We have found NO performance improvement
>>>> while using more and more mpi
>>>> processes (1-2-3) and open-mp threads (from
>>>> 1 to 72 threads). Did anyone
>>>> faced to this problem? Does anyone know any
>>>> possible reasons of such
>>>> behaviour?
>>>>
>>>>
>>>> Solver behavior is dependent on the input
>>>> matrix. The only general-purpose solvers
>>>> are direct, but they do not scale linearly and
>>>> have high memory requirements.
>>>>
>>>> Thus, in order to make progress you will have
>>>> to be specific about your matrices.
>>>>
>>>> We use AMG preconditioner and GMRES solver
>>>> from KSP package, as our
>>>> matrix is large (from 100 000 to 1e+6 rows
>>>> and columns), sparse,
>>>> non-symmetric and includes both positive
>>>> and negative values. But
>>>> performance problems also exist while using
>>>> CG solvers with symmetric
>>>> matrices.
>>>>
>>>>
>>>> There are many PETSc examples, such as example
>>>> 5 for the Laplacian, that exhibit
>>>> good scaling with both AMG and GMG.
>>>>
>>>> Could anyone help us to set appropriate
>>>> options of the preconditioner
>>>> and solver? Now we use default parameters,
>>>> maybe they are not the best,
>>>> but we do not know a good combination. Or
>>>> maybe you could suggest any
>>>> other pairs of preconditioner+solver for
>>>> such tasks?
>>>>
>>>> I can provide more information: the
>>>> matrices that we solve, c++ script
>>>> to run solving using petsc and any
>>>> statistics obtained by our runs.
>>>>
>>>>
>>>> First, please provide a description of the
>>>> linear system, and the output of
>>>>
>>>> -ksp_view -ksp_monitor_true_residual
>>>> -ksp_converged_reason -log_view
>>>>
>>>> for each test case.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>> Thank you in advance!
>>>>
>>>> Best regards,
>>>> Lidiia Varshavchik,
>>>> Ioffe Institute, St. Petersburg, Russia
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before
>>>> they begin their experiments is infinitely more
>>>> interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they
>>> begin their experiments is infinitely more interesting
>>> than any results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin
>>> their experiments is infinitely more interesting than any
>>> results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to
>> which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220606/be25165e/attachment-0001.html>
More information about the petsc-users
mailing list