[petsc-users] Sparse linear system solving

Mark Adams mfadams at lbl.gov
Fri Jun 3 07:17:41 CDT 2022


Your timing data in the first plot seems to have random integers (2,1,1)
added to random iterations (0,2,12).
Perhaps there is a bug in your test setup?

Mark

On Fri, Jun 3, 2022 at 6:42 AM Lidia <lidia.varsh at mail.ioffe.ru> wrote:

> Dear Matt, Barry,
>
> thank you for the information about openMP!
>
> Now all processes are loaded well. But we see a strange behaviour of
> running times at different iterations, see description below. Could you
> please explain us the reason and how we can improve it?
>
> We need to quickly solve a big (about 1e6 rows) square sparse
> non-symmetric matrix many times (about 1e5 times) consequently. Matrix is
> constant at every iteration, and the right-side vector B is slowly changed
> (we think that its change at every iteration should be less then 0.001 %).
> So we use every previous solution vector X as an initial guess for the next
> iteration. AMG preconditioner and GMRES solver are used.
>
> We have tested the code using a matrix with 631 000 rows, during 15
> consequent iterations, using vector X from the previous iterations.
> Right-side vector B and matrix A are constant during the whole running. The
> time of the first iteration is large (about 2 seconds) and is quickly
> decreased to the next iterations (average time of last iterations were
> about 0.00008 s). But some iterations in the middle (# 2 and # 12) have
> huge time - 0.999063 second (see the figure with time dynamics attached).
> This time of 0.999 second does not depend on the size of a matrix, on the
> number of MPI processes, these time jumps also exist if we vary vector B.
> Why these time jumps appear and how we can avoid them?
>
> The ksp_monitor out for this running (included 15 iterations) using 36 MPI
> processes and a file with the memory bandwidth information (testSpeed) are
> also attached. We can provide our C++ script if it is needed.
>
> Thanks a lot!
> Best,
> Lidiia
>
>
>
> On 01.06.2022 21:14, Matthew Knepley wrote:
>
> On Wed, Jun 1, 2022 at 1:43 PM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
>
>> Dear Matt,
>>
>> Thank you for the rule of 10,000 variables per process! We have run ex.5
>> with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics
>> (see the figure "performance.png" - dependency of the solving time in
>> seconds on the number of cores). We have used GAMG preconditioner
>> (multithread: we have added the option "
>> -pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have
>> set one openMP thread to every MPI process. Now the ex.5 is working good on
>> many mpi processes! But the running uses about 100 GB of RAM.
>>
>> How we can run ex.5 using many openMP threads without mpi? If we just
>> change the running command, the cores are not loaded normally: usually just
>> one core is loaded in 100 % and others are idle. Sometimes all cores are
>> working in 100 % during 1 second but then again become idle about 30
>> seconds. Can the preconditioner use many threads and how to activate this
>> option?
>>
>
> Maye you could describe what you are trying to accomplish? Threads and
> processes are not really different, except for memory sharing. However,
> sharing large complex data structures rarely works. That is why they get
> partitioned and operate effectively as distributed memory. You would not
> really save memory by using
> threads in this instance, if that is your goal. This is detailed in the
> talks in this session (see 2016 PP Minisymposium on this page
> https://cse.buffalo.edu/~knepley/relacs.html).
>
>   Thanks,
>
>      Matt
>
>
>> The solving times (the time of the solver work) using 60 openMP threads
>> is 511 seconds now, and while using 60 MPI processes - 13.19 seconds.
>>
>> ksp_monitor outs for both cases (many openMP threads or many MPI
>> processes) are attached.
>>
>>
>> Thank you!
>> Best,
>> Lidia
>>
>> On 31.05.2022 15:21, Matthew Knepley wrote:
>>
>> I have looked at the local logs. First, you have run problems of size 12
>> and 24. As a rule of thumb, you need 10,000
>> variables per process in order to see good speedup.
>>
>>   Thanks,
>>
>>      Matt
>>
>> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Tue, May 31, 2022 at 7:39 AM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
>>>
>>>> Matt, Mark, thank you much for your answers!
>>>>
>>>>
>>>> Now we have run example # 5 on our computer cluster and on the local
>>>> server and also have not seen any performance increase, but by unclear
>>>> reason running times on the local server are much better than on the
>>>> cluster.
>>>>
>>> I suspect that you are trying to get speedup without increasing the
>>> memory bandwidth:
>>>
>>>
>>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>> Now we will try to run petsc #5 example inside a docker container on
>>>> our server and see if the problem is in our environment. I'll write you the
>>>> results of this test as soon as we get it.
>>>>
>>>> The ksp_monitor outs for the 5th test at the current local server
>>>> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3
>>>> mpi processes) are attached .
>>>>
>>>>
>>>> And one more question. Potentially we can use 10 nodes and 96 threads
>>>> at each node on our cluster. What do you think, which combination of
>>>> numbers of mpi processes and openmp threads may be the best for the 5th
>>>> example?
>>>>
>>>> Thank you!
>>>>
>>>>
>>>> Best,
>>>> Lidiia
>>>>
>>>> On 31.05.2022 05:42, Mark Adams wrote:
>>>>
>>>> And if you see "NO" change in performance I suspect the solver/matrix
>>>> is all on one processor.
>>>> (PETSc does not use threads by default so threads should not change
>>>> anything).
>>>>
>>>> As Matt said, it is best to start with a PETSc example that does
>>>> something like what you want (parallel linear solve, see
>>>> src/ksp/ksp/tutorials for examples), and then add your code to it.
>>>> That way you get the basic infrastructure in place for you, which is
>>>> pretty obscure to the uninitiated.
>>>>
>>>> Mark
>>>>
>>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Mon, May 30, 2022 at 10:12 PM Lidia <lidia.varsh at mail.ioffe.ru>
>>>>> wrote:
>>>>>
>>>>>> Dear colleagues,
>>>>>>
>>>>>> Is here anyone who have solved big sparse linear matrices using PETSC?
>>>>>>
>>>>>
>>>>> There are lots of publications with this kind of data. Here is one
>>>>> recent one: https://arxiv.org/abs/2204.01722
>>>>>
>>>>>
>>>>>> We have found NO performance improvement while using more and more
>>>>>> mpi
>>>>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did
>>>>>> anyone
>>>>>> faced to this problem? Does anyone know any possible reasons of such
>>>>>> behaviour?
>>>>>>
>>>>>
>>>>> Solver behavior is dependent on the input matrix. The only
>>>>> general-purpose solvers
>>>>> are direct, but they do not scale linearly and have high memory
>>>>> requirements.
>>>>>
>>>>> Thus, in order to make progress you will have to be specific about
>>>>> your matrices.
>>>>>
>>>>>
>>>>>> We use AMG preconditioner and GMRES solver from KSP package, as our
>>>>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse,
>>>>>> non-symmetric and includes both positive and negative values. But
>>>>>> performance problems also exist while using CG solvers with symmetric
>>>>>> matrices.
>>>>>>
>>>>>
>>>>> There are many PETSc examples, such as example 5 for the Laplacian,
>>>>> that exhibit
>>>>> good scaling with both AMG and GMG.
>>>>>
>>>>>
>>>>>> Could anyone help us to set appropriate options of the preconditioner
>>>>>> and solver? Now we use default parameters, maybe they are not the
>>>>>> best,
>>>>>> but we do not know a good combination. Or maybe you could suggest any
>>>>>> other pairs of preconditioner+solver for such tasks?
>>>>>>
>>>>>> I can provide more information: the matrices that we solve, c++
>>>>>> script
>>>>>> to run solving using petsc and any statistics obtained by our runs.
>>>>>>
>>>>>
>>>>> First, please provide a description of the linear system, and the
>>>>> output of
>>>>>
>>>>>   -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view
>>>>>
>>>>> for each test case.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>>> Thank you in advance!
>>>>>>
>>>>>> Best regards,
>>>>>> Lidiia Varshavchik,
>>>>>> Ioffe Institute, St. Petersburg, Russia
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/a1455232/attachment-0001.html>


More information about the petsc-users mailing list