[petsc-users] Sparse linear system solving
Lidia
lidia.varsh at mail.ioffe.ru
Fri Jun 3 05:36:32 CDT 2022
Dear Matt, Barry,
thank you for the information about openMP!
Now all processes are loaded well. But we see a strange behaviour of
running times at different iterations, see description below. Could you
please explain us the reason and how we can improve it?
We need to quickly solve a big (about 1e6 rows) square sparse
non-symmetric matrix many times (about 1e5 times) consequently. Matrix
is constant at every iteration, and the right-side vector B is slowly
changed (we think that its change at every iteration should be less then
0.001 %). So we use every previous solution vector X as an initial guess
for the next iteration. AMG preconditioner and GMRES solver are used.
We have tested the code using a matrix with 631 000 rows, during 15
consequent iterations, using vector X from the previous iterations.
Right-side vector B and matrix A are constant during the whole running.
The time of the first iteration is large (about 2 seconds) and is
quickly decreased to the next iterations (average time of last
iterations were about 0.00008 s). But some iterations in the middle (# 2
and # 12) have huge time - 0.999063 second (see the figure with time
dynamics attached). This time of 0.999 second does not depend on the
size of a matrix, on the number of MPI processes, these time jumps also
exist if we vary vector B. Why these time jumps appear and how we can
avoid them?
The ksp_monitor out for this running (included 15 iterations) using 36
MPI processes and a file with the memory bandwidth information
(testSpeed) are also attached. We can provide our C++ script if it is
needed.
Thanks a lot!
Best,
Lidiia
On 01.06.2022 21:14, Matthew Knepley wrote:
> On Wed, Jun 1, 2022 at 1:43 PM Lidia <lidia.varsh at mail.ioffe.ru> wrote:
>
> Dear Matt,
>
> Thank you for the rule of 10,000 variables per process! We have
> run ex.5 with matrix 1e4 x 1e4 at our cluster and got a good
> performance dynamics (see the figure "performance.png" -
> dependency of the solving time in seconds on the number of cores).
> We have used GAMG preconditioner (multithread: we have added the
> option "-pc_gamg_use_parallel_coarse_grid_solver") and GMRES
> solver. And we have set one openMP thread to every MPI process.
> Now the ex.5 is working good on many mpi processes! But the
> running uses about 100 GB of RAM.
>
> How we can run ex.5 using many openMP threads without mpi? If we
> just change the running command, the cores are not loaded
> normally: usually just one core is loaded in 100 % and others are
> idle. Sometimes all cores are working in 100 % during 1 second but
> then again become idle about 30 seconds. Can the preconditioner
> use many threads and how to activate this option?
>
>
> Maye you could describe what you are trying to accomplish? Threads and
> processes are not really different, except for memory sharing.
> However, sharing large complex data structures rarely works. That is
> why they get partitioned and operate effectively as distributed
> memory. You would not really save memory by using
> threads in this instance, if that is your goal. This is detailed in
> the talks in this session (see 2016 PP Minisymposium on this page
> https://cse.buffalo.edu/~knepley/relacs.html).
>
> Thanks,
>
> Matt
>
> The solving times (the time of the solver work) using 60 openMP
> threads is 511 seconds now, and while using 60 MPI processes -
> 13.19 seconds.
>
> ksp_monitor outs for both cases (many openMP threads or many MPI
> processes) are attached.
>
>
> Thank you!
>
> Best,
> Lidia
>
> On 31.05.2022 15:21, Matthew Knepley wrote:
>> I have looked at the local logs. First, you have run problems of
>> size 12 and 24. As a rule of thumb, you need 10,000
>> variables per process in order to see good speedup.
>>
>> Thanks,
>>
>> Matt
>>
>> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley
>> <knepley at gmail.com> wrote:
>>
>> On Tue, May 31, 2022 at 7:39 AM Lidia
>> <lidia.varsh at mail.ioffe.ru> wrote:
>>
>> Matt, Mark, thank you much for your answers!
>>
>>
>> Now we have run example # 5 on our computer cluster and
>> on the local server and also have not seen any
>> performance increase, but by unclear reason running times
>> on the local server are much better than on the cluster.
>>
>> I suspect that you are trying to get speedup without
>> increasing the memory bandwidth:
>>
>> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>>
>> Thanks,
>>
>> Matt
>>
>> Now we will try to run petsc #5 example inside a docker
>> container on our server and see if the problem is in our
>> environment. I'll write you the results of this test as
>> soon as we get it.
>>
>> The ksp_monitor outs for the 5th test at the current
>> local server configuration (for 2 and 4 mpi processes)
>> and for the cluster (for 1 and 3 mpi processes) are
>> attached .
>>
>>
>> And one more question. Potentially we can use 10 nodes
>> and 96 threads at each node on our cluster. What do you
>> think, which combination of numbers of mpi processes and
>> openmp threads may be the best for the 5th example?
>>
>> Thank you!
>>
>>
>> Best,
>> Lidiia
>>
>> On 31.05.2022 05:42, Mark Adams wrote:
>>> And if you see "NO" change in performance I suspect the
>>> solver/matrix is all on one processor.
>>> (PETSc does not use threads by default so threads should
>>> not change anything).
>>>
>>> As Matt said, it is best to start with a PETSc
>>> example that does something like what you want (parallel
>>> linear solve, see src/ksp/ksp/tutorials for examples),
>>> and then add your code to it.
>>> That way you get the basic infrastructure in place for
>>> you, which is pretty obscure to the uninitiated.
>>>
>>> Mark
>>>
>>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley
>>> <knepley at gmail.com> wrote:
>>>
>>> On Mon, May 30, 2022 at 10:12 PM Lidia
>>> <lidia.varsh at mail.ioffe.ru> wrote:
>>>
>>> Dear colleagues,
>>>
>>> Is here anyone who have solved big sparse linear
>>> matrices using PETSC?
>>>
>>>
>>> There are lots of publications with this kind of
>>> data. Here is one recent one:
>>> https://arxiv.org/abs/2204.01722
>>>
>>> We have found NO performance improvement while
>>> using more and more mpi
>>> processes (1-2-3) and open-mp threads (from 1 to
>>> 72 threads). Did anyone
>>> faced to this problem? Does anyone know any
>>> possible reasons of such
>>> behaviour?
>>>
>>>
>>> Solver behavior is dependent on the input matrix.
>>> The only general-purpose solvers
>>> are direct, but they do not scale linearly and have
>>> high memory requirements.
>>>
>>> Thus, in order to make progress you will have to be
>>> specific about your matrices.
>>>
>>> We use AMG preconditioner and GMRES solver from
>>> KSP package, as our
>>> matrix is large (from 100 000 to 1e+6 rows and
>>> columns), sparse,
>>> non-symmetric and includes both positive and
>>> negative values. But
>>> performance problems also exist while using CG
>>> solvers with symmetric
>>> matrices.
>>>
>>>
>>> There are many PETSc examples, such as example 5 for
>>> the Laplacian, that exhibit
>>> good scaling with both AMG and GMG.
>>>
>>> Could anyone help us to set appropriate options
>>> of the preconditioner
>>> and solver? Now we use default parameters, maybe
>>> they are not the best,
>>> but we do not know a good combination. Or maybe
>>> you could suggest any
>>> other pairs of preconditioner+solver for such tasks?
>>>
>>> I can provide more information: the matrices
>>> that we solve, c++ script
>>> to run solving using petsc and any statistics
>>> obtained by our runs.
>>>
>>>
>>> First, please provide a description of the linear
>>> system, and the output of
>>>
>>> -ksp_view -ksp_monitor_true_residual
>>> -ksp_converged_reason -log_view
>>>
>>> for each test case.
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> Thank you in advance!
>>>
>>> Best regards,
>>> Lidiia Varshavchik,
>>> Ioffe Institute, St. Petersburg, Russia
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they
>>> begin their experiments is infinitely more
>>> interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin
>> their experiments is infinitely more interesting than any
>> results to which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to
>> which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/c25582f5/attachment-0001.html>
-------------- next part --------------
[lida at head1 build]$ mpirun -n 36 ./petscTest -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: head1
Device name: i40iw0
Device vendor ID: 0x8086
Device vendor part ID: 14290
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: head1
Local device: i40iw0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
Mat size 630834
using block size is 1
5 17524 87620 105144
1 17524 17524 35048
7 17523 122667 140190
21 17523 367989 385512
27 17523 473127 490650
31 17523 543219 560742
2 17524 35048 52572
3 17524 52572 70096
4 17524 70096 87620
0 17524 0 17524
6 17523 105144 122667
8 17523 140190 157713
9 17523 157713 175236
11 17523 192759 210282
12 17523 210282 227805
13 17523 227805 245328
14 17523 245328 262851
20 17523 350466 367989
22 17523 385512 403035
23 17523 403035 420558
25 17523 438081 455604
26 17523 455604 473127
28 17523 490650 508173
30 17523 525696 543219
33 17523 578265 595788
34 17523 595788 613311
35 17523 613311 630834
15 17523 262851 280374
16 17523 280374 297897
17 17523 297897 315420
18 17523 315420 332943
19 17523 332943 350466
24 17523 420558 438081
29 17523 508173 525696
10 17523 175236 192759
32 17523 560742 578265
[head1.hpc:242461] 71 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[head1.hpc:242461] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[head1.hpc:242461] 71 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
Compute with tolerance 0.000010000000000000000818030539 solver is gmres
startPC
startSolv
0 KSP Residual norm 1.868353493329e+08
0 KSP preconditioned resid norm 1.868353493329e+08 true resid norm 2.165031654579e+06 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 1.132315559206e+08
1 KSP preconditioned resid norm 1.132315559206e+08 true resid norm 6.461246152989e+07 ||r(i)||/||b|| 2.984365673971e+01
2 KSP Residual norm 1.534820972084e+07
2 KSP preconditioned resid norm 1.534820972084e+07 true resid norm 2.426876823961e+07 ||r(i)||/||b|| 1.120942882672e+01
3 KSP Residual norm 7.539322505186e+06
3 KSP preconditioned resid norm 7.539322505186e+06 true resid norm 1.829739078019e+07 ||r(i)||/||b|| 8.451327139485e+00
4 KSP Residual norm 4.660669278808e+06
4 KSP preconditioned resid norm 4.660669278808e+06 true resid norm 1.744671242073e+07 ||r(i)||/||b|| 8.058409854574e+00
5 KSP Residual norm 3.223391594815e+06
5 KSP preconditioned resid norm 3.223391594815e+06 true resid norm 1.737561446785e+07 ||r(i)||/||b|| 8.025570633618e+00
6 KSP Residual norm 2.240424900880e+06
6 KSP preconditioned resid norm 2.240424900880e+06 true resid norm 1.683362112781e+07 ||r(i)||/||b|| 7.775230949719e+00
7 KSP Residual norm 1.623399472779e+06
7 KSP preconditioned resid norm 1.623399472779e+06 true resid norm 1.624000914301e+07 ||r(i)||/||b|| 7.501049284271e+00
8 KSP Residual norm 1.211518107569e+06
8 KSP preconditioned resid norm 1.211518107569e+06 true resid norm 1.558830757667e+07 ||r(i)||/||b|| 7.200036795627e+00
9 KSP Residual norm 9.642201969240e+05
9 KSP preconditioned resid norm 9.642201969240e+05 true resid norm 1.486473650844e+07 ||r(i)||/||b|| 6.865828717562e+00
10 KSP Residual norm 7.867651557046e+05
10 KSP preconditioned resid norm 7.867651557046e+05 true resid norm 1.396084153269e+07 ||r(i)||/||b|| 6.448331368812e+00
11 KSP Residual norm 7.078405789961e+05
11 KSP preconditioned resid norm 7.078405789961e+05 true resid norm 1.296873719329e+07 ||r(i)||/||b|| 5.990091260724e+00
12 KSP Residual norm 6.335098563709e+05
12 KSP preconditioned resid norm 6.335098563709e+05 true resid norm 1.164201582227e+07 ||r(i)||/||b|| 5.377295892022e+00
13 KSP Residual norm 5.397665070507e+05
13 KSP preconditioned resid norm 5.397665070507e+05 true resid norm 1.042661489959e+07 ||r(i)||/||b|| 4.815917992485e+00
14 KSP Residual norm 4.549629296863e+05
14 KSP preconditioned resid norm 4.549629296863e+05 true resid norm 9.420542232153e+06 ||r(i)||/||b|| 4.351226095114e+00
15 KSP Residual norm 3.627838605442e+05
15 KSP preconditioned resid norm 3.627838605442e+05 true resid norm 8.546289749804e+06 ||r(i)||/||b|| 3.947420229042e+00
16 KSP Residual norm 2.974632184520e+05
16 KSP preconditioned resid norm 2.974632184520e+05 true resid norm 7.707507230485e+06 ||r(i)||/||b|| 3.559997478181e+00
17 KSP Residual norm 2.584437744774e+05
17 KSP preconditioned resid norm 2.584437744774e+05 true resid norm 6.996748201244e+06 ||r(i)||/||b|| 3.231707114510e+00
18 KSP Residual norm 2.172287358399e+05
18 KSP preconditioned resid norm 2.172287358399e+05 true resid norm 6.008578157843e+06 ||r(i)||/||b|| 2.775284206646e+00
19 KSP Residual norm 1.807320553225e+05
19 KSP preconditioned resid norm 1.807320553225e+05 true resid norm 5.166440962968e+06 ||r(i)||/||b|| 2.386311974719e+00
20 KSP Residual norm 1.583700438237e+05
20 KSP preconditioned resid norm 1.583700438237e+05 true resid norm 4.613820989743e+06 ||r(i)||/||b|| 2.131063986978e+00
21 KSP Residual norm 1.413879944302e+05
21 KSP preconditioned resid norm 1.413879944302e+05 true resid norm 4.151504476178e+06 ||r(i)||/||b|| 1.917525994318e+00
22 KSP Residual norm 1.228172205521e+05
22 KSP preconditioned resid norm 1.228172205521e+05 true resid norm 3.630290527838e+06 ||r(i)||/||b|| 1.676784041545e+00
23 KSP Residual norm 1.084793002546e+05
23 KSP preconditioned resid norm 1.084793002546e+05 true resid norm 3.185566371074e+06 ||r(i)||/||b|| 1.471371729986e+00
24 KSP Residual norm 9.520569914833e+04
24 KSP preconditioned resid norm 9.520569914833e+04 true resid norm 2.811378949429e+06 ||r(i)||/||b|| 1.298539420189e+00
25 KSP Residual norm 8.331027569193e+04
25 KSP preconditioned resid norm 8.331027569193e+04 true resid norm 2.487128345424e+06 ||r(i)||/||b|| 1.148772277839e+00
26 KSP Residual norm 7.116546817077e+04
26 KSP preconditioned resid norm 7.116546817077e+04 true resid norm 2.128784852233e+06 ||r(i)||/||b|| 9.832580728002e-01
27 KSP Residual norm 6.107201042673e+04
27 KSP preconditioned resid norm 6.107201042673e+04 true resid norm 1.816742057822e+06 ||r(i)||/||b|| 8.391295591358e-01
28 KSP Residual norm 5.407959454186e+04
28 KSP preconditioned resid norm 5.407959454186e+04 true resid norm 1.590698721931e+06 ||r(i)||/||b|| 7.347230783285e-01
29 KSP Residual norm 4.859208455279e+04
29 KSP preconditioned resid norm 4.859208455279e+04 true resid norm 1.405619902078e+06 ||r(i)||/||b|| 6.492375753974e-01
30 KSP Residual norm 4.463327440008e+04
30 KSP preconditioned resid norm 4.463327440008e+04 true resid norm 1.258789113490e+06 ||r(i)||/||b|| 5.814183413104e-01
31 KSP Residual norm 3.927742507325e+04
31 KSP preconditioned resid norm 3.927742507325e+04 true resid norm 1.086402490838e+06 ||r(i)||/||b|| 5.017951994097e-01
32 KSP Residual norm 3.417683630748e+04
32 KSP preconditioned resid norm 3.417683630748e+04 true resid norm 9.566603594382e+05 ||r(i)||/||b|| 4.418689941159e-01
33 KSP Residual norm 3.002775921838e+04
33 KSP preconditioned resid norm 3.002775921838e+04 true resid norm 8.429546731968e+05 ||r(i)||/||b|| 3.893498145460e-01
34 KSP Residual norm 2.622152046131e+04
34 KSP preconditioned resid norm 2.622152046131e+04 true resid norm 7.578781071384e+05 ||r(i)||/||b|| 3.500540537296e-01
35 KSP Residual norm 2.264910466846e+04
35 KSP preconditioned resid norm 2.264910466846e+04 true resid norm 6.684892523160e+05 ||r(i)||/||b|| 3.087665027447e-01
36 KSP Residual norm 1.970721593805e+04
36 KSP preconditioned resid norm 1.970721593805e+04 true resid norm 5.905536805578e+05 ||r(i)||/||b|| 2.727690744422e-01
37 KSP Residual norm 1.666104858674e+04
37 KSP preconditioned resid norm 1.666104858674e+04 true resid norm 5.172223947409e+05 ||r(i)||/||b|| 2.388983060118e-01
38 KSP Residual norm 1.432004409785e+04
38 KSP preconditioned resid norm 1.432004409785e+04 true resid norm 4.593351142808e+05 ||r(i)||/||b|| 2.121609230559e-01
39 KSP Residual norm 1.211549914084e+04
39 KSP preconditioned resid norm 1.211549914084e+04 true resid norm 4.019170298644e+05 ||r(i)||/||b|| 1.856402556583e-01
40 KSP Residual norm 1.061599294842e+04
40 KSP preconditioned resid norm 1.061599294842e+04 true resid norm 3.586589723898e+05 ||r(i)||/||b|| 1.656599207828e-01
41 KSP Residual norm 9.577489574913e+03
41 KSP preconditioned resid norm 9.577489574913e+03 true resid norm 3.221505690964e+05 ||r(i)||/||b|| 1.487971635034e-01
42 KSP Residual norm 8.221576307371e+03
42 KSP preconditioned resid norm 8.221576307371e+03 true resid norm 2.745213067979e+05 ||r(i)||/||b|| 1.267978258965e-01
43 KSP Residual norm 6.898384710028e+03
43 KSP preconditioned resid norm 6.898384710028e+03 true resid norm 2.330710645170e+05 ||r(i)||/||b|| 1.076524973776e-01
44 KSP Residual norm 6.087330352788e+03
44 KSP preconditioned resid norm 6.087330352788e+03 true resid norm 2.058183089407e+05 ||r(i)||/||b|| 9.506480355857e-02
45 KSP Residual norm 5.207144067562e+03
45 KSP preconditioned resid norm 5.207144067562e+03 true resid norm 1.745194864065e+05 ||r(i)||/||b|| 8.060828396546e-02
46 KSP Residual norm 4.556037825199e+03
46 KSP preconditioned resid norm 4.556037825199e+03 true resid norm 1.551715592432e+05 ||r(i)||/||b|| 7.167172771584e-02
47 KSP Residual norm 3.856329202278e+03
47 KSP preconditioned resid norm 3.856329202278e+03 true resid norm 1.315660202980e+05 ||r(i)||/||b|| 6.076863588562e-02
48 KSP Residual norm 3.361878313389e+03
48 KSP preconditioned resid norm 3.361878313389e+03 true resid norm 1.147746368397e+05 ||r(i)||/||b|| 5.301291396685e-02
49 KSP Residual norm 2.894852363045e+03
49 KSP preconditioned resid norm 2.894852363045e+03 true resid norm 9.951811967458e+04 ||r(i)||/||b|| 4.596612685273e-02
50 KSP Residual norm 2.576639763678e+03
50 KSP preconditioned resid norm 2.576639763678e+03 true resid norm 8.828512403741e+04 ||r(i)||/||b|| 4.077775207151e-02
51 KSP Residual norm 2.176356645511e+03
51 KSP preconditioned resid norm 2.176356645511e+03 true resid norm 7.535533182060e+04 ||r(i)||/||b|| 3.480564898957e-02
52 KSP Residual norm 1.909590120581e+03
52 KSP preconditioned resid norm 1.909590120581e+03 true resid norm 6.643741378378e+04 ||r(i)||/||b|| 3.068657848177e-02
53 KSP Residual norm 1.625794696835e+03
53 KSP preconditioned resid norm 1.625794696835e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 53
#################################################################################
SOLV gmres iter 0
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 53 time 2.000408 s(2000407820.91200017929077148438 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 1
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000082 s(82206.13200000001234002411 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 2
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.999076 s(999076088.56700003147125244141 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 3
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000081 s(80689.84000000001105945557 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 4
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000079 s(79139.94299999999930150807 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 5
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000065 s(65399.49300000000948784873 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 6
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(79554.38999999999941792339 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 7
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(80431.21900000001187436283 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 8
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000080 s(80255.19100000000617001206 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 9
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000081 s(80568.19700000000011641532 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 10
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000078 s(78323.06299999999464489520 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 11
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000072 s(71933.38600000001315493137 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 12
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.999063 s(999063438.25300002098083496094 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 13
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000070 s(69632.13800000000628642738 us)
#################################################################################
startPC
startSolv
0 KSP Residual norm 1.625794716222e+03
0 KSP preconditioned resid norm 1.625794716222e+03 true resid norm 5.591842695771e+04 ||r(i)||/||b|| 2.582799509625e-02
Linear solve converged due to CONVERGED_RTOL iterations 0
#################################################################################
SOLV gmres iter 14
Relative error is 1.009408197(min 1.000000000, max 2.386602308), tril rel error 0.138442009(min 0.000000070, max 1.000000000) (converged reason is CONVERGED_RTOL) iterations 0 time 0.000073 s(73498.46099999999569263309 us)
#################################################################################
nohup: appending output to ‘nohup.out’
nohup: failed to run command ‘localc’: No such file or directory
**************************************** ***********************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------
./petscTest on a named head1.hpc with 36 processors, by lida Fri Jun 3 13:23:29 2022
Using Petsc Release Version 3.17.1, unknown
Max Max/Min Avg Total
Time (sec): 8.454e+01 1.440 5.941e+01
Objects: 7.030e+02 1.000 7.030e+02
Flops: 1.018e+09 2.522 5.062e+08 1.822e+10
Flops/sec: 1.734e+07 3.633 8.567e+06 3.084e+08
MPI Msg Count: 5.257e+04 1.584 4.249e+04 1.530e+06
MPI Msg Len (bytes): 1.453e+09 14.133 2.343e+04 3.585e+10
MPI Reductions: 7.800e+02 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 5.9406e+01 100.0% 1.8223e+10 100.0% 1.530e+06 100.0% 2.343e+04 100.0% 7.620e+02 97.7%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 75 1.0 3.5652e+0155.4 0.00e+00 0.0 1.9e+04 8.0e+00 7.5e+01 8 0 1 0 10 8 0 1 0 10 0
BuildTwoSidedF 51 1.0 3.5557e+0157.2 0.00e+00 0.0 8.5e+03 3.8e+05 5.1e+01 8 0 1 9 7 8 0 1 9 7 0
MatMult 1503 1.0 2.8036e+00 1.3 6.78e+08 3.8 1.1e+06 2.4e+04 4.0e+00 4 53 74 75 1 4 53 74 75 1 3473
MatMultAdd 328 1.0 2.2706e-01 2.0 2.43e+07 2.3 1.5e+05 3.0e+03 0.0e+00 0 2 10 1 0 0 2 10 1 0 1985
MatMultTranspose 328 1.0 4.6323e-01 2.6 4.98e+07 4.7 1.5e+05 3.0e+03 4.0e+00 0 3 10 1 1 0 3 10 1 1 1090
MatSolve 82 1.0 2.3332e-04 2.0 1.23e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 190
MatLUFactorSym 1 1.0 1.2696e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 2.1874e-05 2.1 1.60e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 26
MatConvert 5 1.0 3.0352e-02 1.2 0.00e+00 0.0 5.7e+03 9.8e+03 4.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatScale 12 1.0 9.6534e-03 2.1 2.13e+06 4.1 2.8e+03 2.0e+04 0.0e+00 0 0 0 0 0 0 0 0 0 0 2795
MatResidual 328 1.0 5.8371e-01 1.2 1.50e+08 4.7 2.3e+05 2.0e+04 0.0e+00 1 10 15 13 0 1 10 15 13 0 3018
MatAssemblyBegin 70 1.0 3.5348e+0142.1 0.00e+00 0.0 8.5e+03 3.8e+05 2.4e+01 9 0 1 9 3 9 0 1 9 3 0
MatAssemblyEnd 70 1.0 2.0657e+00 1.0 7.97e+06573.7 0.0e+00 0.0e+00 8.1e+01 3 0 0 0 10 3 0 0 0 11 7
MatGetRowIJ 1 1.0 4.5262e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 1 1.0 3.6275e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatCreateSubMat 2 1.0 2.2424e-03 1.1 0.00e+00 0.0 3.5e+01 1.2e+03 2.8e+01 0 0 0 0 4 0 0 0 0 4 0
MatGetOrdering 1 1.0 3.3737e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 4 1.0 2.7961e-01 2.3 0.00e+00 0.0 3.1e+04 5.1e+04 2.4e+01 0 0 2 4 3 0 0 2 4 3 0
MatZeroEntries 4 1.0 6.2166e-04176.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 4 1.0 1.0507e-02 1.1 2.81e+04 1.6 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 63
MatTranspose 8 1.0 3.0698e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMultSym 12 1.0 1.5384e-01 1.2 0.00e+00 0.0 8.5e+03 2.0e+04 3.6e+01 0 0 1 0 5 0 0 1 0 5 0
MatMatMultNum 12 1.0 7.4942e-02 1.3 1.29e+07 8.0 2.8e+03 2.0e+04 4.0e+00 0 1 0 0 1 0 1 0 0 1 1636
MatPtAPSymbolic 4 1.0 5.7252e-01 1.0 0.00e+00 0.0 1.4e+04 4.8e+04 2.8e+01 1 0 1 2 4 1 0 1 2 4 0
MatPtAPNumeric 4 1.0 9.4753e-01 1.0 3.99e+0714.7 3.7e+03 1.0e+05 2.0e+01 2 1 0 1 3 2 1 0 1 3 256
MatTrnMatMultSym 1 1.0 2.3274e+00 1.0 0.00e+00 0.0 5.6e+03 7.0e+05 1.2e+01 4 0 0 11 2 4 0 0 11 2 0
MatRedundantMat 1 1.0 3.9176e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatMPIConcateSeq 1 1.0 3.0197e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetLocalMat 13 1.0 8.8896e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 12 1.0 2.0289e-01 1.8 0.00e+00 0.0 2.0e+04 3.8e+04 0.0e+00 0 0 1 2 0 0 0 1 2 0 0
VecMDot 93 1.0 2.7982e-01 3.2 5.32e+07 1.0 0.0e+00 0.0e+00 9.3e+01 0 10 0 0 12 0 10 0 0 12 6711
VecNorm 209 1.0 3.7674e-01 4.3 6.40e+06 1.0 0.0e+00 0.0e+00 2.1e+02 0 1 0 0 27 0 1 0 0 27 591
VecScale 112 1.0 5.5057e-04 1.5 1.50e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 91073
VecCopy 1071 1.0 1.4028e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1293 1.0 6.6862e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 72 1.0 1.4381e-03 1.6 2.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 60574
VecAYPX 2036 1.0 2.0018e-02 1.7 1.96e+07 1.5 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 23728
VecAXPBYCZ 656 1.0 7.2191e-03 1.7 2.30e+07 1.6 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 74818
VecMAXPY 151 1.0 8.7415e-02 1.3 1.06e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 43052
VecAssemblyBegin 28 1.0 2.6426e-01100.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.7e+01 0 0 0 0 3 0 0 0 0 4 0
VecAssemblyEnd 28 1.0 4.3833e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 1356 1.0 1.7490e-02 1.5 9.52e+06 1.6 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 12767
VecScatterBegin 2341 1.0 5.3816e-01 5.2 0.00e+00 0.0 1.5e+06 2.0e+04 1.6e+01 1 0 95 81 2 1 0 95 81 2 0
VecScatterEnd 2341 1.0 2.4118e+00 1.9 2.55e+07368.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 23
VecNormalize 112 1.0 7.0541e-02 2.8 4.50e+06 1.1 0.0e+00 0.0e+00 1.1e+02 0 1 0 0 14 0 1 0 0 15 2132
SFSetGraph 31 1.0 2.5626e-0218.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 24 1.0 1.0973e-01 1.6 0.00e+00 0.0 2.9e+04 1.5e+04 2.4e+01 0 0 2 1 3 0 0 2 1 3 0
SFBcastBegin 28 1.0 2.3203e-02 3.2 0.00e+00 0.0 2.5e+04 5.7e+04 0.0e+00 0 0 2 4 0 0 0 2 4 0 0
SFBcastEnd 28 1.0 7.4158e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFPack 2369 1.0 4.4483e-0118.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
SFUnpack 2369 1.0 3.6313e-0271.6 2.55e+07368.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1497
KSPSetUp 11 1.0 5.1386e-03 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 15 1.0 3.4692e+00 1.0 9.40e+08 2.4 1.4e+06 1.9e+04 2.3e+02 6 95 90 73 30 6 95 90 73 30 4972
KSPGMRESOrthog 93 1.0 3.1573e-01 2.5 1.06e+08 1.0 0.0e+00 0.0e+00 9.3e+01 0 21 0 0 12 0 21 0 0 12 11896
PCGAMGGraph_AGG 4 1.0 3.0876e-01 1.0 1.83e+06 4.7 8.5e+03 1.3e+04 3.6e+01 1 0 1 0 5 1 0 1 0 5 70
PCGAMGCoarse_AGG 4 1.0 2.8281e+00 1.1 0.00e+00 0.0 4.8e+04 1.3e+05 4.7e+01 5 0 3 17 6 5 0 3 17 6 0
PCGAMGProl_AGG 4 1.0 3.2106e-01 1.8 0.00e+00 0.0 1.2e+04 2.4e+04 6.3e+01 0 0 1 1 8 0 0 1 1 8 0
PCGAMGPOpt_AGG 4 1.0 2.6704e-01 1.0 2.82e+07 3.0 4.5e+04 1.8e+04 1.6e+02 0 2 3 2 21 0 2 3 2 22 1589
GAMG: createProl 4 1.0 3.5902e+00 1.0 3.01e+07 3.0 1.1e+05 6.6e+04 3.1e+02 6 2 7 21 40 6 2 7 21 41 124
Create Graph 4 1.0 3.0346e-02 1.2 0.00e+00 0.0 5.7e+03 9.8e+03 4.0e+00 0 0 0 0 1 0 0 0 0 1 0
Filter Graph 4 1.0 2.8303e-01 1.0 1.83e+06 4.7 2.8e+03 2.0e+04 3.2e+01 0 0 0 0 4 0 0 0 0 4 76
MIS/Agg 4 1.0 2.7965e-01 2.3 0.00e+00 0.0 3.1e+04 5.1e+04 2.4e+01 0 0 2 4 3 0 0 2 4 3 0
SA: col data 4 1.0 2.1554e-02 1.2 0.00e+00 0.0 9.2e+03 2.9e+04 2.7e+01 0 0 1 1 3 0 0 1 1 4 0
SA: frmProl0 4 1.0 1.5549e-01 1.0 0.00e+00 0.0 2.6e+03 5.5e+03 2.0e+01 0 0 0 0 3 0 0 0 0 3 0
SA: smooth 4 1.0 1.8642e-01 1.0 2.16e+06 4.0 1.1e+04 2.0e+04 5.2e+01 0 0 1 1 7 0 0 1 1 7 148
GAMG: partLevel 4 1.0 1.5208e+00 1.0 3.99e+0714.7 1.8e+04 5.9e+04 1.0e+02 3 1 1 3 13 3 1 1 3 13 159
repartition 1 1.0 3.5234e-03 1.0 0.00e+00 0.0 8.0e+01 5.3e+02 5.3e+01 0 0 0 0 7 0 0 0 0 7 0
Invert-Sort 1 1.0 3.3143e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0
Move A 1 1.0 1.1502e-03 1.2 0.00e+00 0.0 3.5e+01 1.2e+03 1.5e+01 0 0 0 0 2 0 0 0 0 2 0
Move P 1 1.0 1.4414e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 2 0 0 0 0 2 0
PCGAMG Squ l00 1 1.0 2.3274e+00 1.0 0.00e+00 0.0 5.6e+03 7.0e+05 1.2e+01 4 0 0 11 2 4 0 0 11 2 0
PCGAMG Gal l00 1 1.0 1.2443e+00 1.0 1.06e+07 5.0 9.0e+03 1.1e+05 1.2e+01 2 1 1 3 2 2 1 1 3 2 135
PCGAMG Opt l00 1 1.0 1.4166e-01 1.0 5.88e+05 1.7 4.5e+03 4.8e+04 1.0e+01 0 0 0 1 1 0 0 0 1 1 130
PCGAMG Gal l01 1 1.0 2.5946e-01 1.0 2.64e+07543.5 6.8e+03 1.2e+04 1.2e+01 0 0 0 0 2 0 0 0 0 2 271
PCGAMG Opt l01 1 1.0 2.9430e-02 1.0 1.11e+06444.1 4.9e+03 1.5e+03 1.0e+01 0 0 0 0 1 0 0 0 0 1 90
PCGAMG Gal l02 1 1.0 1.2971e-02 1.0 3.34e+06 0.0 2.1e+03 1.3e+03 1.2e+01 0 0 0 0 2 0 0 0 0 2 343
PCGAMG Opt l02 1 1.0 3.6016e-03 1.0 2.61e+05 0.0 2.0e+03 2.1e+02 1.0e+01 0 0 0 0 1 0 0 0 0 1 97
PCGAMG Gal l03 1 1.0 5.7189e-04 1.1 1.45e+04 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 2 0 0 0 0 2 25
PCGAMG Opt l03 1 1.0 4.1255e-04 1.1 5.02e+03 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 12
PCSetUp 1 1.0 5.1101e+00 1.0 6.99e+07 5.5 1.3e+05 6.5e+04 4.6e+02 9 4 9 24 58 9 4 9 24 60 135
PCApply 82 1.0 2.5729e+00 1.0 7.17e+08 4.0 1.2e+06 1.6e+04 1.4e+01 4 49 80 53 2 4 49 80 53 2 3488
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 8 2 1248 0.
Matrix 119 66 25244856 0.
Matrix Coarsen 4 4 2688 0.
Vector 402 293 26939160 0.
Index Set 67 58 711824 0.
Star Forest Graph 49 28 36128 0.
Krylov Solver 11 4 124000 0.
Preconditioner 11 4 3712 0.
Viewer 1 0 0 0.
PetscRandom 4 4 2840 0.
Distributed Mesh 9 4 20512 0.
Discrete System 9 4 4096 0.
Weak Form 9 4 2656 0.
========================================================================================================================
Average time to get PetscTime(): 2.86847e-08
Average time for MPI_Barrier(): 1.14387e-05
Average time for zero size MPI_Send(): 3.53196e-06
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_monitor_true_residual
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make
-----------------------------------------
Libraries compiled on 2022-05-25 10:03:14 on head1.hpc
Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core
Using PETSc directory: /home/lida
Using PETSc arch:
-----------------------------------------
Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
-----------------------------------------
Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include
-----------------------------------------
Using C linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc
Using Fortran linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90
Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------
[lida at head1 build]$
-------------- next part --------------
[lida at head1 petsc]$ export OMP_NUM_THREADS=1
[lida at head1 petsc]$ make streams NPMAX=8 2>/dev/null
/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -o MPIVersion.o -c -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -I/home/lida/Code/petsc/include -I/home/lida/Code/petsc/arch-linux-c-opt/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 `pwd`/MPIVersion.c
Running streams with '/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpiexec --oversubscribe ' using 'NPMAX=8'
1 16106.3237 Rate (MB/s)
2 28660.2442 Rate (MB/s) 1.77944
3 42041.2053 Rate (MB/s) 2.61023
4 57109.2439 Rate (MB/s) 3.54577
5 66797.5164 Rate (MB/s) 4.14729
6 79516.0361 Rate (MB/s) 4.93695
7 88664.6509 Rate (MB/s) 5.50497
8 101902.1854 Rate (MB/s) 6.32685
------------------------------------------------
Unable to open matplotlib to plot speedup
Unable to open matplotlib to plot speedup
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time per iterations.png
Type: image/png
Size: 15274 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/c25582f5/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: without # 0,2,12 iterations.png
Type: image/png
Size: 17069 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220603/c25582f5/attachment-0003.png>
More information about the petsc-users
mailing list