[petsc-users] Sparse linear system solving

Wed Jun 1 13:08:51 CDT 2022

  PETSc is an MPI library. It is not an OpenMP library. Only some external packages that PETSc uses can use OpenMP, things like GAMG will not utilize OpenMP pretty much at all.

  Barry

> On Jun 1, 2022, at 1:37 PM, Lidia <lidia.varsh at mail.ioffe.ru> wrote:
> 
> Dear Matt,
> 
> Thank you for the rule of 10,000 variables per process! We have run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics (see the figure "performance.png" - dependency of the solving time in seconds on the number of cores). We have used GAMG preconditioner (multithread: we have added the option "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we have set one openMP thread to every MPI process. Now the ex.5 is working good on many mpi processes! But the running uses about 100 GB of RAM.
> 
> How we can run ex.5 using many openMP threads without mpi? If we just change the running command, the cores are not loaded normally: usually just one core is loaded in 100 % and others are idle. Sometimes all cores are working in 100 % during 1 second but then again become idle about 30 seconds. Can the preconditioner use many threads and how to activate this option?
> 
> The solving times (the time of the solver work) using 60 openMP threads is 511 seconds now, and while using 60 MPI processes - 13.19 seconds.
> 
> ksp_monitor outs for both cases (many openMP threads or many MPI processes) are attached.
> 
> 
> 
> Thank you!
> 
> Best,
> Lidia
> 
> On 31.05.2022 15:21, Matthew Knepley wrote:
>> I have looked at the local logs. First, you have run problems of size 12  and 24. As a rule of thumb, you need 10,000
>> variables per process in order to see good speedup.
>> 
>>   Thanks,
>> 
>>      Matt
>> 
>> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>> On Tue, May 31, 2022 at 7:39 AM Lidia <lidia.varsh at mail.ioffe.ru <mailto:lidia.varsh at mail.ioffe.ru>> wrote:
>> Matt, Mark, thank you much for your answers!
>> 
>> 
>> 
>> Now we have run example # 5 on our computer cluster and on the local server and also have not seen any performance increase, but by unclear reason running times on the local server are much better than on the cluster.
>> 
>> I suspect that you are trying to get speedup without increasing the memory bandwidth:
>> 
>>   https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup <https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup>
>> 
>>   Thanks,
>> 
>>      Matt 
>> Now we will try to run petsc #5 example inside a docker container on our server and see if the problem is in our environment. I'll write you the results of this test as soon as we get it.
>> 
>> The ksp_monitor outs for the 5th test at the current local server configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3 mpi processes) are attached .
>> 
>> 
>> 
>> And one more question. Potentially we can use 10 nodes and 96 threads at each node on our cluster. What do you think, which combination of numbers of mpi processes and openmp threads may be the best for the 5th example?
>> 
>> Thank you!
>> 
>> 
>> 
>> Best,
>> Lidiia
>> 
>> On 31.05.2022 05:42, Mark Adams wrote:
>>> And if you see "NO" change in performance I suspect the solver/matrix is all on one processor.
>>> (PETSc does not use threads by default so threads should not change anything).
>>> 
>>> As Matt said, it is best to start with a PETSc example that does something like what you want (parallel linear solve, see src/ksp/ksp/tutorials for examples), and then                         add your code to it.
>>> That way you get the basic infrastructure in place for you, which is pretty obscure to the uninitiated.
>>> 
>>> Mark
>>> 
>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> On Mon, May 30, 2022 at 10:12 PM Lidia <lidia.varsh at mail.ioffe.ru <mailto:lidia.varsh at mail.ioffe.ru>> wrote:
>>> Dear colleagues,
>>> 
>>> Is here anyone who have solved big sparse linear matrices using PETSC?
>>> 
>>> There are lots of publications with this kind of data. Here is one recent one: https://arxiv.org/abs/2204.01722 <https://arxiv.org/abs/2204.01722>
>>>  
>>> We have found NO performance improvement while using more and more mpi 
>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did anyone 
>>> faced to this problem? Does anyone know any possible reasons of such 
>>> behaviour?
>>> 
>>> Solver behavior is dependent on the input matrix. The only general-purpose solvers
>>> are direct, but they do not scale linearly and have high memory requirements.
>>> 
>>> Thus, in order to make progress you will have to be specific about your matrices.
>>>  
>>> We use AMG preconditioner and GMRES solver from KSP package, as our 
>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse, 
>>> non-symmetric and includes both positive and negative values. But 
>>> performance problems also exist while using CG solvers with symmetric 
>>> matrices.
>>> 
>>> There are many PETSc examples, such as example 5 for the Laplacian, that exhibit
>>> good scaling with both AMG and GMG.
>>>  
>>> Could anyone help us to set appropriate options of the preconditioner 
>>> and solver? Now we use default parameters, maybe they are not the best, 
>>> but we do not know a good combination. Or maybe you could suggest any 
>>> other pairs of preconditioner+solver for such tasks?
>>> 
>>> I can provide more information: the matrices that we solve, c++ script 
>>> to run solving using petsc and any statistics obtained by our runs.
>>> 
>>> First, please provide a description of the linear system, and the output of
>>> 
>>>   -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view
>>> 
>>> for each test case.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> Thank you in advance!
>>> 
>>> Best regards,
>>> Lidiia Varshavchik,
>>> Ioffe Institute, St. Petersburg, Russia
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> <performance.png><testOpenMP.txt><testMpi60.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220601/088c94b2/attachment-0001.html>