[petsc-users] Sparse linear system solving
Matthew Knepley
knepley at gmail.com
Wed Jun 1 13:14:02 CDT 2022
On Wed, Jun 1, 2022 at 1:43 PM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
> Dear Matt,
>
> Thank you for the rule of 10,000 variables per process! We have run ex.5
> with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics
> (see the figure "performance.png" - dependency of the solving time in
> seconds on the number of cores). We have used GAMG preconditioner
> (multithread: we have added the option "
> -pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have
> set one openMP thread to every MPI process. Now the ex.5 is working good on
> many mpi processes! But the running uses about 100 GB of RAM.
>
> How we can run ex.5 using many openMP threads without mpi? If we just
> change the running command, the cores are not loaded normally: usually just
> one core is loaded in 100 % and others are idle. Sometimes all cores are
> working in 100 % during 1 second but then again become idle about 30
> seconds. Can the preconditioner use many threads and how to activate this
> option?
>
Maye you could describe what you are trying to accomplish? Threads and
processes are not really different, except for memory sharing. However,
sharing large complex data structures rarely works. That is why they get
partitioned and operate effectively as distributed memory. You would not
really save memory by using
threads in this instance, if that is your goal. This is detailed in the
talks in this session (see 2016 PP Minisymposium on this page
https://cse.buffalo.edu/~knepley/relacs.html).
Thanks,
Matt
> The solving times (the time of the solver work) using 60 openMP threads is
> 511 seconds now, and while using 60 MPI processes - 13.19 seconds.
>
> ksp_monitor outs for both cases (many openMP threads or many MPI
> processes) are attached.
>
>
> Thank you!
> Best,
> Lidia
>
> On 31.05.2022 15:21, Matthew Knepley wrote:
>
> I have looked at the local logs. First, you have run problems of size 12
> and 24. As a rule of thumb, you need 10,000
> variables per process in order to see good speedup.
>
> Thanks,
>
> Matt
>
> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Tue, May 31, 2022 at 7:39 AM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
>>
>>> Matt, Mark, thank you much for your answers!
>>>
>>>
>>> Now we have run example # 5 on our computer cluster and on the local
>>> server and also have not seen any performance increase, but by unclear
>>> reason running times on the local server are much better than on the
>>> cluster.
>>>
>> I suspect that you are trying to get speedup without increasing the
>> memory bandwidth:
>>
>>
>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>>
>> Thanks,
>>
>> Matt
>>
>>> Now we will try to run petsc #5 example inside a docker container on our
>>> server and see if the problem is in our environment. I'll write you the
>>> results of this test as soon as we get it.
>>>
>>> The ksp_monitor outs for the 5th test at the current local server
>>> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3
>>> mpi processes) are attached .
>>>
>>>
>>> And one more question. Potentially we can use 10 nodes and 96 threads at
>>> each node on our cluster. What do you think, which combination of numbers
>>> of mpi processes and openmp threads may be the best for the 5th example?
>>>
>>> Thank you!
>>>
>>>
>>> Best,
>>> Lidiia
>>>
>>> On 31.05.2022 05:42, Mark Adams wrote:
>>>
>>> And if you see "NO" change in performance I suspect the solver/matrix is
>>> all on one processor.
>>> (PETSc does not use threads by default so threads should not change
>>> anything).
>>>
>>> As Matt said, it is best to start with a PETSc example that does
>>> something like what you want (parallel linear solve, see
>>> src/ksp/ksp/tutorials for examples), and then add your code to it.
>>> That way you get the basic infrastructure in place for you, which is
>>> pretty obscure to the uninitiated.
>>>
>>> Mark
>>>
>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Mon, May 30, 2022 at 10:12 PM Lidia <lidia.varsh at mail.ioffe.ru>
>>>> wrote:
>>>>
>>>>> Dear colleagues,
>>>>>
>>>>> Is here anyone who have solved big sparse linear matrices using PETSC?
>>>>>
>>>>
>>>> There are lots of publications with this kind of data. Here is one
>>>> recent one: https://arxiv.org/abs/2204.01722
>>>>
>>>>
>>>>> We have found NO performance improvement while using more and more mpi
>>>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did
>>>>> anyone
>>>>> faced to this problem? Does anyone know any possible reasons of such
>>>>> behaviour?
>>>>>
>>>>
>>>> Solver behavior is dependent on the input matrix. The only
>>>> general-purpose solvers
>>>> are direct, but they do not scale linearly and have high memory
>>>> requirements.
>>>>
>>>> Thus, in order to make progress you will have to be specific about your
>>>> matrices.
>>>>
>>>>
>>>>> We use AMG preconditioner and GMRES solver from KSP package, as our
>>>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse,
>>>>> non-symmetric and includes both positive and negative values. But
>>>>> performance problems also exist while using CG solvers with symmetric
>>>>> matrices.
>>>>>
>>>>
>>>> There are many PETSc examples, such as example 5 for the Laplacian,
>>>> that exhibit
>>>> good scaling with both AMG and GMG.
>>>>
>>>>
>>>>> Could anyone help us to set appropriate options of the preconditioner
>>>>> and solver? Now we use default parameters, maybe they are not the
>>>>> best,
>>>>> but we do not know a good combination. Or maybe you could suggest any
>>>>> other pairs of preconditioner+solver for such tasks?
>>>>>
>>>>> I can provide more information: the matrices that we solve, c++ script
>>>>> to run solving using petsc and any statistics obtained by our runs.
>>>>>
>>>>
>>>> First, please provide a description of the linear system, and the
>>>> output of
>>>>
>>>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view
>>>>
>>>> for each test case.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>>> Thank you in advance!
>>>>>
>>>>> Best regards,
>>>>> Lidiia Varshavchik,
>>>>> Ioffe Institute, St. Petersburg, Russia
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220601/f2410cd3/attachment.html>
More information about the petsc-users
mailing list