# [petsc-users] Sparse linear system solving

Lidia lidia.varsh at mail.ioffe.ru
Fri Jun 3 05:36:32 CDT 2022

Dear Matt, Barry,

thank you for the information about openMP!

Now all processes are loaded well. But we see a strange behaviour of
running times at different iterations, see description below. Could you
please explain us the reason and how we can improve it?

We need to quickly solve a big (about 1e6 rows) square sparse
non-symmetric matrix many times (about 1e5 times) consequently. Matrix
is constant at every iteration, and the right-side vector B is slowly
changed (we think that its change at every iteration should be less then
0.001 %). So we use every previous solution vector X as an initial guess
for the next iteration. AMG preconditioner and GMRES solver are used.

We have tested the code using a matrix with 631 000 rows, during 15
consequent iterations, using vector X from the previous iterations.
Right-side vector B and matrix A are constant during the whole running.
The time of the first iteration is large (about 2 seconds) and is
quickly decreased to the next iterations (average time of last
iterations were about 0.00008 s). But some iterations in the middle (# 2
and # 12) have huge time - 0.999063 second (see the figure with time
dynamics attached). This time of 0.999 second does not depend on the
size of a matrix, on the number of MPI processes, these time jumps also
exist if we vary vector B. Why these time jumps appear and how we can
avoid them?

The ksp_monitor out for this running (included 15 iterations) using 36
MPI processes and a file with the memory bandwidth information
(testSpeed) are also attached. We can provide our C++ script if it is
needed.

Thanks a lot!

Best,
Lidiia

On 01.06.2022 21:14, Matthew Knepley wrote:
> On Wed, Jun 1, 2022 at 1:43 PM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
>
>     Dear Matt,
>
>     Thank you for the rule of 10,000 variables per process! We have
>     run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good
>     performance dynamics (see the figure "performance.png" -
>     dependency of the solving time in seconds on the number of cores).
>     option "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES
>     solver. And we have set one openMP thread to every MPI process.
>     Now the ex.5 is working good on many mpi processes! But the
>     running uses about 100 GB of RAM.
>
>     How we can run ex.5 using many openMP threads without mpi? If we
>     just change the running command, the cores are not loaded
>     normally: usually just one core is loaded in 100 % and others are
>     idle. Sometimes all cores are working in 100 % during 1 second but
>     then again become idle about 30 seconds. Can the preconditioner
>     use many threads and how to activate this option?
>
>
> Maye you could describe what you are trying to accomplish? Threads and
> processes are not really different, except for memory sharing.
> However, sharing large complex data structures rarely works. That is
> why they get partitioned and operate effectively as distributed
> memory. You would not really save memory by using
> threads in this instance, if that is your goal. This is detailed in
> the talks in this session (see 2016 PP Minisymposium on this page
> https://cse.buffalo.edu/~knepley/relacs.html).
>
>   Thanks,
>
>      Matt
>
>     The solving times (the time of the solver work) using 60 openMP
>     threads is 511 seconds now, and while using 60 MPI processes -
>     13.19 seconds.
>
>     ksp_monitor outs for both cases (many openMP threads or many MPI
>     processes) are attached.
>
>
>     Thank you!
>
>     Best,
>     Lidia
>
>     On 31.05.2022 15:21, Matthew Knepley wrote:
>>     I have looked at the local logs. First, you have run problems of
>>     size 12  and 24. As a rule of thumb, you need 10,000
>>     variables per process in order to see good speedup.
>>
>>       Thanks,
>>
>>          Matt
>>
>>     On Tue, May 31, 2022 at 8:19 AM Matthew Knepley
>>     <knepley at gmail.com> wrote:
>>
>>         On Tue, May 31, 2022 at 7:39 AM Lidia
>>         <lidia.varsh at mail.ioffe.ru> wrote:
>>
>>
>>
>>             Now we have run example # 5 on our computer cluster and
>>             on the local server and also have not seen any
>>             performance increase, but by unclear reason running times
>>             on the local server are much better than on the cluster.
>>
>>         I suspect that you are trying to get speedup without
>>         increasing the memory bandwidth:
>>
>>         https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>>
>>           Thanks,
>>
>>              Matt
>>
>>             Now we will try to run petsc #5 example inside a docker
>>             container on our server and see if the problem is in our
>>             environment. I'll write you the results of this test as
>>             soon as we get it.
>>
>>             The ksp_monitor outs for the 5th test at the current
>>             local server configuration (for 2 and 4 mpi processes)
>>             and for the cluster (for 1 and 3 mpi processes) are
>>             attached .
>>
>>
>>             And one more question. Potentially we can use 10 nodes
>>             and 96 threads at each node on our cluster. What do you
>>             think, which combination of numbers of mpi processes and
>>             openmp threads may be the best for the 5th example?
>>
>>             Thank you!
>>
>>
>>             Best,
>>             Lidiia
>>
>>             On 31.05.2022 05:42, Mark Adams wrote:
>>>             And if you see "NO" change in performance I suspect the
>>>             solver/matrix is all on one processor.
>>>             not change anything).
>>>
>>>             As Matt said, it is best to start with a PETSc
>>>             example that does something like what you want (parallel
>>>             linear solve, see src/ksp/ksp/tutorials for examples),
>>>             That way you get the basic infrastructure in place for
>>>             you, which is pretty obscure to the uninitiated.
>>>
>>>             Mark
>>>
>>>             On Mon, May 30, 2022 at 10:18 PM Matthew Knepley
>>>             <knepley at gmail.com> wrote:
>>>
>>>                 On Mon, May 30, 2022 at 10:12 PM Lidia
>>>                 <lidia.varsh at mail.ioffe.ru> wrote:
>>>
>>>                     Dear colleagues,
>>>
>>>                     Is here anyone who have solved big sparse linear
>>>                     matrices using PETSC?
>>>
>>>
>>>                 There are lots of publications with this kind of
>>>                 data. Here is one recent one:
>>>                 https://arxiv.org/abs/2204.01722
>>>
>>>                     We have found NO performance improvement while
>>>                     using more and more mpi
>>>                     processes (1-2-3) and open-mp threads (from 1 to
>>>                     faced to this problem? Does anyone know any
>>>                     possible reasons of such
>>>                     behaviour?
>>>
>>>
>>>                 Solver behavior is dependent on the input matrix.
>>>                 The only general-purpose solvers
>>>                 are direct, but they do not scale linearly and have
>>>                 high memory requirements.
>>>
>>>                 Thus, in order to make progress you will have to be
>>>
>>>                     We use AMG preconditioner and GMRES solver from
>>>                     KSP package, as our
>>>                     matrix is large (from 100 000 to 1e+6 rows and
>>>                     columns), sparse,
>>>                     non-symmetric and includes both positive and
>>>                     negative values. But
>>>                     performance problems also exist while using CG
>>>                     solvers with symmetric
>>>                     matrices.
>>>
>>>
>>>                 There are many PETSc examples, such as example 5 for
>>>                 the Laplacian, that exhibit
>>>                 good scaling with both AMG and GMG.
>>>
>>>                     Could anyone help us to set appropriate options
>>>                     of the preconditioner
>>>                     and solver? Now we use default parameters, maybe
>>>                     they are not the best,
>>>                     but we do not know a good combination. Or maybe
>>>                     you could suggest any
>>>                     other pairs of preconditioner+solver for such tasks?
>>>
>>>                     that we solve, c++ script
>>>                     to run solving using petsc and any statistics
>>>                     obtained by our runs.
>>>
>>>
>>>                 First, please provide a description of the linear
>>>                 system, and the output of
>>>
>>>                   -ksp_view -ksp_monitor_true_residual
>>>                 -ksp_converged_reason -log_view
>>>
>>>                 for each test case.
>>>
>>>                   Thanks,
>>>
>>>                      Matt
>>>
>>>
>>>                     Best regards,
>>>                     Lidiia Varshavchik,
>>>                     Ioffe Institute, St. Petersburg, Russia
>>>
>>>
>>>
>>>                 --
>>>                 What most experimenters take for granted before they
>>>                 begin their experiments is infinitely more
>>>                 interesting than any results to which their
>>>                 -- Norbert Wiener
>>>
>>>                 https://www.cse.buffalo.edu/~knepley/
>>>                 <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>>
>>         --
>>         What most experimenters take for granted before they begin
>>         their experiments is infinitely more interesting than any
>>         results to which their experiments lead.
>>         -- Norbert Wiener
>>
>>         https://www.cse.buffalo.edu/~knepley/
>>         <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>>     --
>>     What most experimenters take for granted before they begin their
>>     experiments is infinitely more interesting than any results to
>>     -- Norbert Wiener
>>
>>     https://www.cse.buffalo.edu/~knepley/
>>     <http://www.cse.buffalo.edu/~knepley/>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/c25582f5/attachment-0001.html>
-------------- next part --------------
[lida at head1 build]\$ mpirun -n 36 ./petscTest -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

Device name:           i40iw0
Device vendor ID:      0x8086
Device vendor part ID: 14290

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

Local device:         i40iw0
Local port:           1
CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
Mat size 630834
using block size is 1
5 17524 87620 105144
1 17524 17524 35048
7 17523 122667 140190
21 17523 367989 385512
27 17523 473127 490650
31 17523 543219 560742
2 17524 35048 52572
3 17524 52572 70096
4 17524 70096 87620
0 17524 0 17524
6 17523 105144 122667
8 17523 140190 157713
9 17523 157713 175236
11 17523 192759 210282
12 17523 210282 227805
13 17523 227805 245328
14 17523 245328 262851
20 17523 350466 367989
22 17523 385512 403035
23 17523 403035 420558
25 17523 438081 455604
26 17523 455604 473127
28 17523 490650 508173
30 17523 525696 543219
33 17523 578265 595788
34 17523 595788 613311
35 17523 613311 630834
15 17523 262851 280374
16 17523 280374 297897
17 17523 297897 315420
18 17523 315420 332943
19 17523 332943 350466
24 17523 420558 438081
29 17523 508173 525696
10 17523 175236 192759
32 17523 560742 578265
[head1.hpc:242461] 71 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[head1.hpc:242461] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[head1.hpc:242461] 71 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
Compute with tolerance 0.000010000000000000000818030539 solver is gmres

startPC
startSolv
0 KSP Residual norm 1.868353493329e+08
0 KSP preconditioned resid norm 1.868353493329e+08 true resid norm 2.165031654579e+06 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 1.132315559206e+08
1 KSP preconditioned resid norm 1.132315559206e+08 true resid norm 6.461246152989e+07 ||r(i)||/||b|| 2.984365673971e+01
2 KSP Residual norm 1.534820972084e+07
2 KSP preconditioned resid norm 1.534820972084e+07 true resid norm 2.426876823961e+07 ||r(i)||/||b|| 1.120942882672e+01
3 KSP Residual norm 7.539322505186e+06
3 KSP preconditioned resid norm 7.539322505186e+06 true resid norm 1.829739078019e+07 ||r(i)||/||b|| 8.451327139485e+00
4 KSP Residual norm 4.660669278808e+06
4 KSP preconditioned resid norm 4.660669278808e+06 true resid norm 1.744671242073e+07 ||r(i)||/||b|| 8.058409854574e+00
5 KSP Residual norm 3.223391594815e+06
5 KSP preconditioned resid norm 3.223391594815e+06 true resid norm 1.737561446785e+07 ||r(i)||/||b|| 8.025570633618e+00
6 KSP Residual norm 2.240424900880e+06
6 KSP preconditioned resid norm 2.240424900880e+06 true resid norm 1.683362112781e+07 ||r(i)||/||b|| 7.775230949719e+00
7 KSP Residual norm 1.623399472779e+06
7 KSP preconditioned resid norm 1.623399472779e+06 true resid norm 1.624000914301e+07 ||r(i)||/||b|| 7.501049284271e+00
8 KSP Residual norm 1.211518107569e+06
8 KSP preconditioned resid norm 1.211518107569e+06 true resid norm 1.558830757667e+07 ||r(i)||/||b|| 7.200036795627e+00
9 KSP Residual norm 9.642201969240e+05
9 KSP preconditioned resid norm 9.642201969240e+05 true resid norm 1.486473650844e+07 ||r(i)||/||b|| 6.865828717562e+00
10 KSP Residual norm 7.867651557046e+05
10 KSP preconditioned resid norm 7.867651557046e+05 true resid norm 1.396084153269e+07 ||r(i)||/||b|| 6.448331368812e+00
11 KSP Residual norm 7.078405789961e+05
11 KSP preconditioned resid norm 7.078405789961e+05 true resid norm 1.296873719329e+07 ||r(i)||/||b|| 5.990091260724e+00
12 KSP Residual norm 6.335098563709e+05
12 KSP preconditioned resid norm 6.335098563709e+05 true resid norm 1.164201582227e+07 ||r(i)||/||b|| 5.377295892022e+00
13 KSP Residual norm 5.397665070507e+05
13 KSP preconditioned resid norm 5.397665070507e+05 true resid norm 1.042661489959e+07 ||r(i)||/||b|| 4.815917992485e+00
14 KSP Residual norm 4.549629296863e+05
14 KSP preconditioned resid norm 4.549629296863e+05 true resid norm 9.420542232153e+06 ||r(i)||/||b|| 4.351226095114e+00
15 KSP Residual norm 3.627838605442e+05
15 KSP preconditioned resid norm 3.627838605442e+05 true resid norm 8.546289749804e+06 ||r(i)||/||b|| 3.947420229042e+00
16 KSP Residual norm 2.974632184520e+05
16 KSP preconditioned resid norm 2.974632184520e+05 true resid norm 7.707507230485e+06 ||r(i)||/||b|| 3.559997478181e+00
17 KSP Residual norm 2.584437744774e+05
17 KSP preconditioned resid norm 2.584437744774e+05 true resid norm 6.996748201244e+06 ||r(i)||/||b|| 3.231707114510e+00
18 KSP Residual norm 2.172287358399e+05
18 KSP preconditioned resid norm 2.172287358399e+05 true resid norm 6.008578157843e+06 ||r(i)||/||b|| 2.775284206646e+00
19 KSP Residual norm 1.807320553225e+05
19 KSP preconditioned resid norm 1.807320553225e+05 true resid norm 5.166440962968e+06 ||r(i)||/||b|| 2.386311974719e+00
20 KSP Residual norm 1.583700438237e+05
20 KSP preconditioned resid norm 1.583700438237e+05 true resid norm 4.613820989743e+06 ||r(i)||/||b|| 2.131063986978e+00
21 KSP Residual norm 1.413879944302e+05
21 KSP preconditioned resid norm 1.413879944302e+05 true resid norm 4.151504476178e+06 ||r(i)||/||b|| 1.917525994318e+00
22 KSP Residual norm 1.228172205521e+05
22 KSP preconditioned resid norm 1.228172205521e+05 true resid norm 3.630290527838e+06 ||r(i)||/||b|| 1.676784041545e+00
23 KSP Residual norm 1.084793002546e+05
23 KSP preconditioned resid norm 1.084793002546e+05 true resid norm 3.185566371074e+06 ||r(i)||/||b|| 1.471371729986e+00
24 KSP Residual norm 9.520569914833e+04
24 KSP preconditioned resid norm 9.520569914833e+04 true resid norm 2.811378949429e+06 ||r(i)||/||b|| 1.298539420189e+00
25 KSP Residual norm 8.331027569193e+04
25 KSP preconditioned resid norm 8.331027569193e+04 true resid norm 2.487128345424e+06 ||r(i)||/||b|| 1.148772277839e+00
26 KSP Residual norm 7.116546817077e+04
26 KSP preconditioned resid norm 7.116546817077e+04 true resid norm 2.128784852233e+06 ||r(i)||/||b|| 9.832580728002e-01
27 KSP Residual norm 6.107201042673e+04
27 KSP preconditioned resid norm 6.107201042673e+04 true resid norm 1.816742057822e+06 ||r(i)||/||b|| 8.391295591358e-01
28 KSP Residual norm 5.407959454186e+04
28 KSP preconditioned resid norm 5.407959454186e+04 true resid norm 1.590698721931e+06 ||r(i)||/||b|| 7.347230783285e-01
29 KSP Residual norm 4.859208455279e+04
29 KSP preconditioned resid norm 4.859208455279e+04 true resid norm 1.405619902078e+06 ||r(i)||/||b|| 6.492375753974e-01
30 KSP Residual norm 4.463327440008e+04
30 KSP preconditioned resid norm 4.463327440008e+04 true resid norm 1.258789113490e+06 ||r(i)||/||b|| 5.814183413104e-01
31 KSP Residual norm 3.927742507325e+04
31 KSP preconditioned resid norm 3.927742507325e+04 true resid norm 1.086402490838e+06 ||r(i)||/||b|| 5.017951994097e-01
32 KSP Residual norm 3.417683630748e+04
32 KSP preconditioned resid norm 3.417683630748e+04 true resid norm 9.566603594382e+05 ||r(i)||/||b|| 4.418689941159e-01
33 KSP Residual norm 3.002775921838e+04
33 KSP preconditioned resid norm 3.002775921838e+04 true resid norm 8.429546731968e+05 ||r(i)||/||b|| 3.893498145460e-01
34 KSP Residual norm 2.622152046131e+04
34 KSP preconditioned resid norm 2.622152046131e+04 true resid norm 7.578781071384e+05 ||r(i)||/||b|| 3.500540537296e-01
35 KSP Residual norm 2.264910466846e+04
35 KSP preconditioned resid norm 2.264910466846e+04 true resid norm 6.684892523160e+05 ||r(i)||/||b|| 3.087665027447e-01
36 KSP Residual norm 1.970721593805e+04
36 KSP preconditioned resid norm 1.970721593805e+04 true resid norm 5.905536805578e+05 ||r(i)||/||b|| 2.727690744422e-01
37 KSP Residual norm 1.666104858674e+04
37 KSP preconditioned resid norm 1.666104858674e+04 true resid norm 5.172223947409e+05 ||r(i)||/||b|| 2.388983060118e-01
38 KSP Residual norm 1.432004409785e+04
38 KSP preconditioned resid norm 1.432004409785e+04 true resid norm 4.593351142808e+05 ||r(i)||/||b|| 2.121609230559e-01
39 KSP Residual norm 1.211549914084e+04
39 KSP preconditioned resid norm 1.211549914084e+04 true resid norm 4.019170298644e+05 ||r(i)||/||b|| 1.856402556583e-01
40 KSP Residual norm 1.061599294842e+04
40 KSP preconditioned resid norm 1.061599294842e+04 true resid norm 3.586589723898e+05 ||r(i)||/||b|| 1.656599207828e-01
41 KSP Residual norm 9.577489574913e+03
41 KSP preconditioned resid norm 9.577489574913e+03 true resid norm 3.221505690964e+05 ||r(i)||/||b|| 1.487971635034e-01
42 KSP Residual norm 8.221576307371e+03
42 KSP preconditioned resid norm 8.221576307371e+03 true resid norm 2.745213067979e+05 ||r(i)||/||b|| 1.267978258965e-01
43 KSP Residual norm 6.898384710028e+03
43 KSP preconditioned resid norm 6.898384710028e+03 true resid norm 2.330710645170e+05 ||r(i)||/||b|| 1.076524973776e-01
44 KSP Residual norm 6.087330352788e+03
44 KSP preconditioned resid norm 6.087330352788e+03 true resid norm 2.058183089407e+05 ||r(i)||/||b|| 9.506480355857e-02
45 KSP Residual norm 5.207144067562e+03
45 KSP preconditioned resid norm 5.207144067562e+03 true resid norm 1.745194864065e+05 ||r(i)||/||b|| 8.060828396546e-02
46 KSP Residual norm 4.556037825199e+03
46 KSP preconditioned resid norm 4.556037825199e+03 true resid norm 1.551715592432e+05 ||r(i)||/||b|| 7.167172771584e-02
47 KSP Residual norm 3.856329202278e+03
47 KSP preconditioned resid norm 3.856329202278e+03 true resid norm 1.315660202980e+05 ||r(i)||/||b|| 6.076863588562e-02
48 KSP Residual norm 3.361878313389e+03
48 KSP preconditioned resid norm 3.361878313389e+03 true resid norm 1.147746368397e+05 ||r(i)||/||b|| 5.301291396685e-02
49 KSP Residual norm 2.894852363045e+03
49 KSP preconditioned resid norm 2.894852363045e+03 true resid norm 9.951811967458e+04 ||r(i)||/||b|| 4.596612685273e-02
50 KSP Residual norm 2.576639763678e+03
50 KSP preconditioned resid norm 2.576639763678e+03 true resid norm 8.828512403741e+04 ||r(i)||/||b|| 4.077775207151e-02
51 KSP Residual norm 2.176356645511e+03
51 KSP preconditioned resid norm 2.176356645511e+03 true resid norm 7.535533182060e+04 ||r(i)||/||b|| 3.480564898957e-02
52 KSP Residual norm 1.909590120581e+03
52 KSP preconditioned resid norm 1.909590120581e+03 true resid norm 6.643741378378e+04 ||r(i)||/||b|| 3.068657848177e-02
53 KSP Residual norm 1.625794696835e+03
53 KSP preconditioned resid norm 1.625794696835e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 53

#################################################################################
SOLV gmres iter 0
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 53 time 2.000408 s(2000407820.91200017929077148438 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 1
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000082 s(82206.13200000001234002411 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 2
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.999076 s(999076088.56700003147125244141 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 3
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000081 s(80689.84000000001105945557 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 4
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000079 s(79139.94299999999930150807 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 5
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000065 s(65399.49300000000948784873 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 6
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(79554.38999999999941792339 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 7
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(80431.21900000001187436283 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 8
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(80255.19100000000617001206 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 9
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000081 s(80568.19700000000011641532 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 10
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000078 s(78323.06299999999464489520 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 11
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000072 s(71933.38600000001315493137 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 12
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.999063 s(999063438.25300002098083496094 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 13
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000070 s(69632.13800000000628642738 us)
#################################################################################

startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0

#################################################################################
SOLV gmres iter 14
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000)   (converged reason is CONVERGED_RTOL) iterations 0 time 0.000073 s(73498.46099999999569263309 us)
#################################################################################

nohup: appending output to ‘nohup.out’
nohup: failed to run command ‘localc’: No such file or directory
**************************************** ***********************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------

./petscTest on a  named head1.hpc with 36 processors, by lida Fri Jun  3 13:23:29 2022
Using Petsc Release Version 3.17.1, unknown

Max       Max/Min     Avg       Total
Time (sec):           8.454e+01     1.440   5.941e+01
Objects:              7.030e+02     1.000   7.030e+02
Flops:                1.018e+09     2.522   5.062e+08  1.822e+10
Flops/sec:            1.734e+07     3.633   8.567e+06  3.084e+08
MPI Msg Count:        5.257e+04     1.584   4.249e+04  1.530e+06
MPI Msg Len (bytes):  1.453e+09    14.133   2.343e+04  3.585e+10
MPI Reductions:       7.800e+02     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
0:      Main Stage: 5.9406e+01 100.0%  1.8223e+10 100.0%  1.530e+06 100.0%  2.343e+04      100.0%  7.620e+02  97.7%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase         %F - percent flop in this phase
%M - percent messages in this phase     %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided         75 1.0 3.5652e+0155.4 0.00e+00 0.0 1.9e+04 8.0e+00 7.5e+01  8  0  1  0 10   8  0  1  0 10     0
BuildTwoSidedF        51 1.0 3.5557e+0157.2 0.00e+00 0.0 8.5e+03 3.8e+05 5.1e+01  8  0  1  9  7   8  0  1  9  7     0
MatMult             1503 1.0 2.8036e+00 1.3 6.78e+08 3.8 1.1e+06 2.4e+04 4.0e+00  4 53 74 75  1   4 53 74 75  1  3473
MatMultAdd           328 1.0 2.2706e-01 2.0 2.43e+07 2.3 1.5e+05 3.0e+03 0.0e+00  0  2 10  1  0   0  2 10  1  0  1985
MatMultTranspose     328 1.0 4.6323e-01 2.6 4.98e+07 4.7 1.5e+05 3.0e+03 4.0e+00  0  3 10  1  1   0  3 10  1  1  1090
MatSolve              82 1.0 2.3332e-04 2.0 1.23e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   190
MatLUFactorSym         1 1.0 1.2696e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 2.1874e-05 2.1 1.60e+01 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    26
MatConvert             5 1.0 3.0352e-02 1.2 0.00e+00 0.0 5.7e+03 9.8e+03 4.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatScale              12 1.0 9.6534e-03 2.1 2.13e+06 4.1 2.8e+03 2.0e+04 0.0e+00  0  0  0  0  0   0  0  0  0  0  2795
MatResidual          328 1.0 5.8371e-01 1.2 1.50e+08 4.7 2.3e+05 2.0e+04 0.0e+00  1 10 15 13  0   1 10 15 13  0  3018
MatAssemblyBegin      70 1.0 3.5348e+0142.1 0.00e+00 0.0 8.5e+03 3.8e+05 2.4e+01  9  0  1  9  3   9  0  1  9  3     0
MatAssemblyEnd        70 1.0 2.0657e+00 1.0 7.97e+06573.7 0.0e+00 0.0e+00 8.1e+01  3  0  0  0 10   3  0  0  0 11     7
MatGetRowIJ            1 1.0 4.5262e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       1 1.0 3.6275e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatCreateSubMat        2 1.0 2.2424e-03 1.1 0.00e+00 0.0 3.5e+01 1.2e+03 2.8e+01  0  0  0  0  4   0  0  0  0  4     0
MatGetOrdering         1 1.0 3.3737e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             4 1.0 2.7961e-01 2.3 0.00e+00 0.0 3.1e+04 5.1e+04 2.4e+01  0  0  2  4  3   0  0  2  4  3     0
MatZeroEntries         4 1.0 6.2166e-04176.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                4 1.0 1.0507e-02 1.1 2.81e+04 1.6 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  1   0  0  0  0  1    63
MatTranspose           8 1.0 3.0698e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMultSym         12 1.0 1.5384e-01 1.2 0.00e+00 0.0 8.5e+03 2.0e+04 3.6e+01  0  0  1  0  5   0  0  1  0  5     0
MatMatMultNum         12 1.0 7.4942e-02 1.3 1.29e+07 8.0 2.8e+03 2.0e+04 4.0e+00  0  1  0  0  1   0  1  0  0  1  1636
MatPtAPSymbolic        4 1.0 5.7252e-01 1.0 0.00e+00 0.0 1.4e+04 4.8e+04 2.8e+01  1  0  1  2  4   1  0  1  2  4     0
MatPtAPNumeric         4 1.0 9.4753e-01 1.0 3.99e+0714.7 3.7e+03 1.0e+05 2.0e+01  2  1  0  1  3   2  1  0  1  3   256
MatTrnMatMultSym       1 1.0 2.3274e+00 1.0 0.00e+00 0.0 5.6e+03 7.0e+05 1.2e+01  4  0  0 11  2   4  0  0 11  2     0
MatRedundantMat        1 1.0 3.9176e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatMPIConcateSeq       1 1.0 3.0197e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetLocalMat        13 1.0 8.8896e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         12 1.0 2.0289e-01 1.8 0.00e+00 0.0 2.0e+04 3.8e+04 0.0e+00  0  0  1  2  0   0  0  1  2  0     0
VecMDot               93 1.0 2.7982e-01 3.2 5.32e+07 1.0 0.0e+00 0.0e+00 9.3e+01  0 10  0  0 12   0 10  0  0 12  6711
VecNorm              209 1.0 3.7674e-01 4.3 6.40e+06 1.0 0.0e+00 0.0e+00 2.1e+02  0  1  0  0 27   0  1  0  0 27   591
VecScale             112 1.0 5.5057e-04 1.5 1.50e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 91073
VecCopy             1071 1.0 1.4028e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1293 1.0 6.6862e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               72 1.0 1.4381e-03 1.6 2.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 60574
VecAYPX             2036 1.0 2.0018e-02 1.7 1.96e+07 1.5 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0 23728
VecAXPBYCZ           656 1.0 7.2191e-03 1.7 2.30e+07 1.6 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0 74818
VecMAXPY             151 1.0 8.7415e-02 1.3 1.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0 21  0  0  0   0 21  0  0  0 43052
VecAssemblyBegin      28 1.0 2.6426e-01100.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.7e+01  0  0  0  0  3   0  0  0  0  4     0
VecAssemblyEnd        28 1.0 4.3833e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    1356 1.0 1.7490e-02 1.5 9.52e+06 1.6 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 12767
VecScatterBegin     2341 1.0 5.3816e-01 5.2 0.00e+00 0.0 1.5e+06 2.0e+04 1.6e+01  1  0 95 81  2   1  0 95 81  2     0
VecScatterEnd       2341 1.0 2.4118e+00 1.9 2.55e+07368.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0    23
VecNormalize         112 1.0 7.0541e-02 2.8 4.50e+06 1.1 0.0e+00 0.0e+00 1.1e+02  0  1  0  0 14   0  1  0  0 15  2132
SFSetGraph            31 1.0 2.5626e-0218.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               24 1.0 1.0973e-01 1.6 0.00e+00 0.0 2.9e+04 1.5e+04 2.4e+01  0  0  2  1  3   0  0  2  1  3     0
SFBcastBegin          28 1.0 2.3203e-02 3.2 0.00e+00 0.0 2.5e+04 5.7e+04 0.0e+00  0  0  2  4  0   0  0  2  4  0     0
SFBcastEnd            28 1.0 7.4158e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFPack              2369 1.0 4.4483e-0118.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
SFUnpack            2369 1.0 3.6313e-0271.6 2.55e+07368.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1497
KSPSetUp              11 1.0 5.1386e-03 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve              15 1.0 3.4692e+00 1.0 9.40e+08 2.4 1.4e+06 1.9e+04 2.3e+02  6 95 90 73 30   6 95 90 73 30  4972
KSPGMRESOrthog        93 1.0 3.1573e-01 2.5 1.06e+08 1.0 0.0e+00 0.0e+00 9.3e+01  0 21  0  0 12   0 21  0  0 12 11896
PCGAMGGraph_AGG        4 1.0 3.0876e-01 1.0 1.83e+06 4.7 8.5e+03 1.3e+04 3.6e+01  1  0  1  0  5   1  0  1  0  5    70
PCGAMGCoarse_AGG       4 1.0 2.8281e+00 1.1 0.00e+00 0.0 4.8e+04 1.3e+05 4.7e+01  5  0  3 17  6   5  0  3 17  6     0
PCGAMGProl_AGG         4 1.0 3.2106e-01 1.8 0.00e+00 0.0 1.2e+04 2.4e+04 6.3e+01  0  0  1  1  8   0  0  1  1  8     0
PCGAMGPOpt_AGG         4 1.0 2.6704e-01 1.0 2.82e+07 3.0 4.5e+04 1.8e+04 1.6e+02  0  2  3  2 21   0  2  3  2 22  1589
GAMG: createProl       4 1.0 3.5902e+00 1.0 3.01e+07 3.0 1.1e+05 6.6e+04 3.1e+02  6  2  7 21 40   6  2  7 21 41   124
Create Graph         4 1.0 3.0346e-02 1.2 0.00e+00 0.0 5.7e+03 9.8e+03 4.0e+00  0  0  0  0  1   0  0  0  0  1     0
Filter Graph         4 1.0 2.8303e-01 1.0 1.83e+06 4.7 2.8e+03 2.0e+04 3.2e+01  0  0  0  0  4   0  0  0  0  4    76
MIS/Agg              4 1.0 2.7965e-01 2.3 0.00e+00 0.0 3.1e+04 5.1e+04 2.4e+01  0  0  2  4  3   0  0  2  4  3     0
SA: col data         4 1.0 2.1554e-02 1.2 0.00e+00 0.0 9.2e+03 2.9e+04 2.7e+01  0  0  1  1  3   0  0  1  1  4     0
SA: frmProl0         4 1.0 1.5549e-01 1.0 0.00e+00 0.0 2.6e+03 5.5e+03 2.0e+01  0  0  0  0  3   0  0  0  0  3     0
SA: smooth           4 1.0 1.8642e-01 1.0 2.16e+06 4.0 1.1e+04 2.0e+04 5.2e+01  0  0  1  1  7   0  0  1  1  7   148
GAMG: partLevel        4 1.0 1.5208e+00 1.0 3.99e+0714.7 1.8e+04 5.9e+04 1.0e+02  3  1  1  3 13   3  1  1  3 13   159
repartition          1 1.0 3.5234e-03 1.0 0.00e+00 0.0 8.0e+01 5.3e+02 5.3e+01  0  0  0  0  7   0  0  0  0  7     0
Invert-Sort          1 1.0 3.3143e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  1   0  0  0  0  1     0
Move A               1 1.0 1.1502e-03 1.2 0.00e+00 0.0 3.5e+01 1.2e+03 1.5e+01  0  0  0  0  2   0  0  0  0  2     0
Move P               1 1.0 1.4414e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01  0  0  0  0  2   0  0  0  0  2     0
PCGAMG Squ l00         1 1.0 2.3274e+00 1.0 0.00e+00 0.0 5.6e+03 7.0e+05 1.2e+01  4  0  0 11  2   4  0  0 11  2     0
PCGAMG Gal l00         1 1.0 1.2443e+00 1.0 1.06e+07 5.0 9.0e+03 1.1e+05 1.2e+01  2  1  1  3  2   2  1  1  3  2   135
PCGAMG Opt l00         1 1.0 1.4166e-01 1.0 5.88e+05 1.7 4.5e+03 4.8e+04 1.0e+01  0  0  0  1  1   0  0  0  1  1   130
PCGAMG Gal l01         1 1.0 2.5946e-01 1.0 2.64e+07543.5 6.8e+03 1.2e+04 1.2e+01  0  0  0  0  2   0  0  0  0  2   271
PCGAMG Opt l01         1 1.0 2.9430e-02 1.0 1.11e+06444.1 4.9e+03 1.5e+03 1.0e+01  0  0  0  0  1   0  0  0  0  1    90
PCGAMG Gal l02         1 1.0 1.2971e-02 1.0 3.34e+06 0.0 2.1e+03 1.3e+03 1.2e+01  0  0  0  0  2   0  0  0  0  2   343
PCGAMG Opt l02         1 1.0 3.6016e-03 1.0 2.61e+05 0.0 2.0e+03 2.1e+02 1.0e+01  0  0  0  0  1   0  0  0  0  1    97
PCGAMG Gal l03         1 1.0 5.7189e-04 1.1 1.45e+04 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  2   0  0  0  0  2    25
PCGAMG Opt l03         1 1.0 4.1255e-04 1.1 5.02e+03 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  1   0  0  0  0  1    12
PCSetUp                1 1.0 5.1101e+00 1.0 6.99e+07 5.5 1.3e+05 6.5e+04 4.6e+02  9  4  9 24 58   9  4  9 24 60   135
PCApply               82 1.0 2.5729e+00 1.0 7.17e+08 4.0 1.2e+06 1.6e+04 1.4e+01  4 49 80 53  2   4 49 80 53  2  3488
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

Container     8              2         1248     0.
Matrix   119             66     25244856     0.
Matrix Coarsen     4              4         2688     0.
Vector   402            293     26939160     0.
Index Set    67             58       711824     0.
Star Forest Graph    49             28        36128     0.
Krylov Solver    11              4       124000     0.
Preconditioner    11              4         3712     0.
Viewer     1              0            0     0.
PetscRandom     4              4         2840     0.
Distributed Mesh     9              4        20512     0.
Discrete System     9              4         4096     0.
Weak Form     9              4         2656     0.
========================================================================================================================
Average time to get PetscTime(): 2.86847e-08
Average time for MPI_Barrier(): 1.14387e-05
Average time for zero size MPI_Send(): 3.53196e-06
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_monitor_true_residual
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make
-----------------------------------------
Libraries compiled on 2022-05-25 10:03:14 on head1.hpc
Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core
Using PETSc directory: /home/lida
Using PETSc arch:
-----------------------------------------

Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native    -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
-----------------------------------------

Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include
-----------------------------------------

Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------

-------------- next part --------------
[lida at head1 petsc]\$ make streams NPMAX=8 2>/dev/null
/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -o MPIVersion.o -c -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3   -I/home/lida/Code/petsc/include -I/home/lida/Code/petsc/arch-linux-c-opt/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include  -I/home/lida/include -I/home/lida/jdk/include -march=native -O3   `pwd`/MPIVersion.c
Running streams with '/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpiexec --oversubscribe ' using 'NPMAX=8'
1  16106.3237   Rate (MB/s)
2  28660.2442   Rate (MB/s) 1.77944
3  42041.2053   Rate (MB/s) 2.61023
4  57109.2439   Rate (MB/s) 3.54577
5  66797.5164   Rate (MB/s) 4.14729
6  79516.0361   Rate (MB/s) 4.93695
7  88664.6509   Rate (MB/s) 5.50497
8 101902.1854   Rate (MB/s) 6.32685
------------------------------------------------
Unable to open matplotlib to plot speedup
Unable to open matplotlib to plot speedup
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time per iterations.png
Type: image/png
Size: 15274 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/c25582f5/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: without # 0,2,12 iterations.png
Type: image/png
Size: 17069 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/c25582f5/attachment-0003.png>