[petsc-users] Sparse linear system solving
Lidia
lidia.varsh at mail.ioffe.ru
Wed Jun 1 12:37:57 CDT 2022
Dear Matt,
Thank you for the rule of 10,000 variables per process! We have run ex.5
with matrix 1e4 x 1e4 at our cluster and got a good performance dynamics
(see the figure "performance.png" - dependency of the solving time in
seconds on the number of cores). We have used GAMG preconditioner
(multithread: we have added the option
"-pc_gamg_use_parallel_coarse_grid_solver") and GMRES solver. And we
have set one openMP thread to every MPI process. Now the ex.5 is working
good on many mpi processes! But the running uses about 100 GB of RAM.
How we can run ex.5 using many openMP threads without mpi? If we just
change the running command, the cores are not loaded normally: usually
just one core is loaded in 100 % and others are idle. Sometimes all
cores are working in 100 % during 1 second but then again become idle
about 30 seconds. Can the preconditioner use many threads and how to
activate this option?
The solving times (the time of the solver work) using 60 openMP threads
is 511 seconds now, and while using 60 MPI processes - 13.19 seconds.
ksp_monitor outs for both cases (many openMP threads or many MPI
processes) are attached.
Thank you!
Best,
Lidia
On 31.05.2022 15:21, Matthew Knepley wrote:
> I have looked at the local logs. First, you have run problems of size
> 12 and 24. As a rule of thumb, you need 10,000
> variables per process in order to see good speedup.
>
> Thanks,
>
> Matt
>
> On Tue, May 31, 2022 at 8:19 AM Matthew Knepley <knepley at gmail.com> wrote:
>
> On Tue, May 31, 2022 at 7:39 AM Lidia <lidia.varsh at mail.ioffe.ru>
> wrote:
>
> Matt, Mark, thank you much for your answers!
>
>
> Now we have run example # 5 on our computer cluster and on the
> local server and also have not seen any performance increase,
> but by unclear reason running times on the local server are
> much better than on the cluster.
>
> I suspect that you are trying to get speedup without increasing
> the memory bandwidth:
>
> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>
> Thanks,
>
> Matt
>
> Now we will try to run petsc #5 example inside a docker
> container on our server and see if the problem is in our
> environment. I'll write you the results of this test as soon
> as we get it.
>
> The ksp_monitor outs for the 5th test at the current local
> server configuration (for 2 and 4 mpi processes) and for the
> cluster (for 1 and 3 mpi processes) are attached .
>
>
> And one more question. Potentially we can use 10 nodes and 96
> threads at each node on our cluster. What do you think, which
> combination of numbers of mpi processes and openmp threads may
> be the best for the 5th example?
>
> Thank you!
>
>
> Best,
> Lidiia
>
> On 31.05.2022 05:42, Mark Adams wrote:
>> And if you see "NO" change in performance I suspect the
>> solver/matrix is all on one processor.
>> (PETSc does not use threads by default so threads should not
>> change anything).
>>
>> As Matt said, it is best to start with a PETSc example that
>> does something like what you want (parallel linear solve, see
>> src/ksp/ksp/tutorials for examples), and then add your code
>> to it.
>> That way you get the basic infrastructure in place for you,
>> which is pretty obscure to the uninitiated.
>>
>> Mark
>>
>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley
>> <knepley at gmail.com> wrote:
>>
>> On Mon, May 30, 2022 at 10:12 PM Lidia
>> <lidia.varsh at mail.ioffe.ru> wrote:
>>
>> Dear colleagues,
>>
>> Is here anyone who have solved big sparse linear
>> matrices using PETSC?
>>
>>
>> There are lots of publications with this kind of data.
>> Here is one recent one: https://arxiv.org/abs/2204.01722
>>
>> We have found NO performance improvement while using
>> more and more mpi
>> processes (1-2-3) and open-mp threads (from 1 to 72
>> threads). Did anyone
>> faced to this problem? Does anyone know any possible
>> reasons of such
>> behaviour?
>>
>>
>> Solver behavior is dependent on the input matrix. The
>> only general-purpose solvers
>> are direct, but they do not scale linearly and have high
>> memory requirements.
>>
>> Thus, in order to make progress you will have to be
>> specific about your matrices.
>>
>> We use AMG preconditioner and GMRES solver from KSP
>> package, as our
>> matrix is large (from 100 000 to 1e+6 rows and
>> columns), sparse,
>> non-symmetric and includes both positive and negative
>> values. But
>> performance problems also exist while using CG
>> solvers with symmetric
>> matrices.
>>
>>
>> There are many PETSc examples, such as example 5 for the
>> Laplacian, that exhibit
>> good scaling with both AMG and GMG.
>>
>> Could anyone help us to set appropriate options of
>> the preconditioner
>> and solver? Now we use default parameters, maybe they
>> are not the best,
>> but we do not know a good combination. Or maybe you
>> could suggest any
>> other pairs of preconditioner+solver for such tasks?
>>
>> I can provide more information: the matrices that we
>> solve, c++ script
>> to run solving using petsc and any statistics
>> obtained by our runs.
>>
>>
>> First, please provide a description of the linear system,
>> and the output of
>>
>> -ksp_view -ksp_monitor_true_residual
>> -ksp_converged_reason -log_view
>>
>> for each test case.
>>
>> Thanks,
>>
>> Matt
>>
>> Thank you in advance!
>>
>> Best regards,
>> Lidiia Varshavchik,
>> Ioffe Institute, St. Petersburg, Russia
>>
>>
>>
>> --
>> What most experimenters take for granted before they
>> begin their experiments is infinitely more interesting
>> than any results to which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to
> which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220601/a441c088/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: performance.png
Type: image/png
Size: 16053 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220601/a441c088/attachment-0001.png>
-------------- next part --------------
[lida at head1 tutorials]$ ./ex5 -m 10000 -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -pc_type gamg -ksp_type gmres -pc_gamg_use_parallel_coarse_grid_solver
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: head1
Device name: i40iw0
Device vendor ID: 0x8086
Device vendor part ID: 14290
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: head1
Local device: i40iw0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
[head1.hpc:274354] 1 more process has sent help message help-mpi-btl-openib.txt / no device params found
[head1.hpc:274354] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[head1.hpc:274354] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
0 KSP Residual norm 2.037538277184e+11
0 KSP preconditioned resid norm 2.037538277184e+11 true resid norm 1.291188079508e+10 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 4.559847344082e+10
1 KSP preconditioned resid norm 4.559847344082e+10 true resid norm 1.145337105566e+10 ||r(i)||/||b|| 8.870412635802e-01
2 KSP Residual norm 1.458580410483e+10
2 KSP preconditioned resid norm 1.458580410483e+10 true resid norm 6.820359295573e+09 ||r(i)||/||b|| 5.282235333346e-01
3 KSP Residual norm 5.133668905377e+09
3 KSP preconditioned resid norm 5.133668905377e+09 true resid norm 3.443273018496e+09 ||r(i)||/||b|| 2.666747837238e-01
4 KSP Residual norm 1.822791754681e+09
4 KSP preconditioned resid norm 1.822791754681e+09 true resid norm 1.429794150530e+09 ||r(i)||/||b|| 1.107347700325e-01
5 KSP Residual norm 6.883552291389e+08
5 KSP preconditioned resid norm 6.883552291389e+08 true resid norm 5.284618300965e+08 ||r(i)||/||b|| 4.092833867378e-02
6 KSP Residual norm 2.738661252083e+08
6 KSP preconditioned resid norm 2.738661252083e+08 true resid norm 2.298184687591e+08 ||r(i)||/||b|| 1.779899244785e-02
7 KSP Residual norm 1.175295112233e+08
7 KSP preconditioned resid norm 1.175295112233e+08 true resid norm 9.785469137958e+07 ||r(i)||/||b|| 7.578655110947e-03
8 KSP Residual norm 4.823372166305e+07
8 KSP preconditioned resid norm 4.823372166305e+07 true resid norm 4.288291058318e+07 ||r(i)||/||b|| 3.321197838159e-03
9 KSP Residual norm 2.019815757215e+07
9 KSP preconditioned resid norm 2.019815757215e+07 true resid norm 1.776678838786e+07 ||r(i)||/||b|| 1.376003129973e-03
10 KSP Residual norm 8.776441360510e+06
10 KSP preconditioned resid norm 8.776441360510e+06 true resid norm 7.333797620917e+06 ||r(i)||/||b|| 5.679883308489e-04
11 KSP Residual norm 3.536170852140e+06
11 KSP preconditioned resid norm 3.536170852140e+06 true resid norm 3.517014965376e+06 ||r(i)||/||b|| 2.723859537734e-04
12 KSP Residual norm 1.369320429479e+06
12 KSP preconditioned resid norm 1.369320429479e+06 true resid norm 1.434993628816e+06 ||r(i)||/||b|| 1.111374594910e-04
Linear solve converged due to CONVERGED_RTOL iterations 12
time 511.480000 m=10000 n=10000
Norm of error 4.8462e+06, Iterations 12
0 KSP Residual norm 6.828607739124e+09
0 KSP preconditioned resid norm 6.828607739124e+09 true resid norm 2.081798084592e+10 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 1.592108138342e+08
1 KSP preconditioned resid norm 1.592108138342e+08 true resid norm 1.085557726631e+09 ||r(i)||/||b|| 5.214519768589e-02
2 KSP Residual norm 4.713015543535e+06
2 KSP preconditioned resid norm 4.713015543535e+06 true resid norm 2.310928708753e+07 ||r(i)||/||b|| 1.110063807752e-03
3 KSP Residual norm 3.998043547851e+05
3 KSP preconditioned resid norm 3.998043547851e+05 true resid norm 2.247029256835e+06 ||r(i)||/||b|| 1.079369451565e-04
4 KSP Residual norm 3.507419330164e+04
4 KSP preconditioned resid norm 3.507419330164e+04 true resid norm 2.008185753840e+05 ||r(i)||/||b|| 9.646400237870e-06
Linear solve converged due to CONVERGED_RTOL iterations 4
Norm of error 5.77295e+11, Iterations 4
**************************************** ***********************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------
./ex5 on a named head1.hpc with 1 processor, by lida Wed Jun 1 20:35:41 2022
Using Petsc Release Version 3.17.1, unknown
Max Max/Min Avg Total
Time (sec): 1.065e+03 1.000 1.065e+03
Objects: 7.090e+02 1.000 7.090e+02
Flops: 3.476e+11 1.000 3.476e+11 3.476e+11
Flops/sec: 3.263e+08 1.000 3.263e+08 3.263e+08
MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 3.4957e-04 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: Original Solve: 8.2717e+02 77.7% 2.5959e+11 74.7% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: Second Solve: 2.3804e+02 22.3% 8.8003e+10 25.3% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
--- Event Stage 1: Original Solve
MatMult 460 1.0 1.5530e+02 1.0 1.05e+11 1.0 0.0e+00 0.0e+00 0.0e+00 15 30 0 0 0 19 40 0 0 0 676
MatMultAdd 91 1.0 1.7280e+01 1.0 8.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 3 0 0 0 463
MatMultTranspose 91 1.0 2.3679e+01 1.0 8.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 3 3 0 0 0 338
MatSolve 13 1.0 6.4134e-05 1.0 5.85e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9
MatLUFactorSym 1 1.0 2.6981e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.5310e-05 1.0 7.30e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5
MatConvert 7 1.0 1.2939e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
MatScale 21 1.0 3.5806e+00 1.0 2.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 572
MatResidual 91 1.0 2.5775e+01 1.0 1.86e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 5 0 0 0 3 7 0 0 0 723
MatAssemblyBegin 36 1.0 1.7278e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 36 1.0 2.1752e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
MatGetRowIJ 1 1.0 5.9204e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 5.1203e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 7 1.0 3.5076e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0
MatAXPY 7 1.0 8.9612e+00 1.0 1.17e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 13
MatMatMultSym 7 1.0 1.7313e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatMatMultNum 7 1.0 1.0714e+01 1.0 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 163
MatPtAPSymbolic 7 1.0 5.7167e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
MatPtAPNumeric 7 1.0 5.7185e+01 1.0 7.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 2 0 0 0 7 3 0 0 0 136
MatTrnMatMultSym 1 1.0 3.3972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0
MatGetSymTrans 7 1.0 3.8600e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 82 1.0 2.0620e+01 1.0 2.84e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 8 0 0 0 2 11 0 0 0 1378
VecNorm 105 1.0 1.4910e+00 1.0 8.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 3 0 0 0 5476
VecScale 90 1.0 8.9402e-01 1.0 2.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2888
VecCopy 294 1.0 1.4368e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecSet 236 1.0 7.5047e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 21 1.0 9.9549e-01 1.0 3.03e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3047
VecAYPX 559 1.0 1.8945e+01 1.0 1.34e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 5 0 0 0 709
VecAXPBYCZ 182 1.0 7.3185e+00 1.0 1.52e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 6 0 0 0 2071
VecMAXPY 102 1.0 2.6425e+01 1.0 4.88e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 14 0 0 0 3 19 0 0 0 1845
VecAssemblyBegin 1 1.0 5.1595e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 1 1.0 1.3132e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 441 1.0 2.1981e+01 1.0 7.34e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 3 3 0 0 0 334
VecNormalize 90 1.0 1.9910e+00 1.0 7.75e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 3 0 0 0 3891
KSPSetUp 16 1.0 7.4611e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
KSPSolve 1 1.0 2.9415e+02 1.0 2.00e+11 1.0 0.0e+00 0.0e+00 0.0e+00 28 58 0 0 0 36 77 0 0 0 680
KSPGMRESOrthog 82 1.0 3.5705e+01 1.0 5.68e+10 1.0 0.0e+00 0.0e+00 0.0e+00 3 16 0 0 0 4 22 0 0 0 1592
PCGAMGGraph_AGG 7 1.0 7.8044e+01 1.0 1.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 9 1 0 0 0 18
PCGAMGCoarse_AGG 7 1.0 1.2952e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 16 0 0 0 0 0
PCGAMGProl_AGG 7 1.0 5.1550e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 6 0 0 0 0 0
PCGAMGPOpt_AGG 7 1.0 1.0284e+02 1.0 4.90e+10 1.0 0.0e+00 0.0e+00 0.0e+00 10 14 0 0 0 12 19 0 0 0 476
GAMG: createProl 7 1.0 3.6476e+02 1.0 5.04e+10 1.0 0.0e+00 0.0e+00 0.0e+00 34 15 0 0 0 44 19 0 0 0 138
Create Graph 7 1.0 1.2939e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
Filter Graph 7 1.0 6.4028e+01 1.0 1.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 8 1 0 0 0 22
MIS/Agg 7 1.0 3.5146e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0
SA: col data 7 1.0 7.9635e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SA: frmProl0 7 1.0 4.7866e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 6 0 0 0 0 0
SA: smooth 7 1.0 4.2317e+01 1.0 2.47e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 5 1 0 0 0 58
GAMG: partLevel 7 1.0 1.1435e+02 1.0 7.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 2 0 0 0 14 3 0 0 0 68
PCGAMG Squ l00 1 1.0 3.3972e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0
PCGAMG Gal l00 1 1.0 6.8174e+01 1.0 4.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 1 0 0 0 8 2 0 0 0 65
PCGAMG Opt l00 1 1.0 2.1367e+01 1.0 1.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 58
PCGAMG Gal l01 1 1.0 3.6124e+01 1.0 2.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 4 1 0 0 0 64
PCGAMG Opt l01 1 1.0 5.1169e+00 1.0 3.61e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 70
PCGAMG Gal l02 1 1.0 9.2202e+00 1.0 8.86e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 96
PCGAMG Opt l02 1 1.0 1.3495e+00 1.0 1.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 93
PCGAMG Gal l03 1 1.0 7.2320e-01 1.0 1.13e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 157
PCGAMG Opt l03 1 1.0 1.7474e-01 1.0 1.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 89
PCGAMG Gal l04 1 1.0 1.0741e-01 1.0 8.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 79
PCGAMG Opt l04 1 1.0 1.9007e-02 1.0 1.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 62
PCGAMG Gal l05 1 1.0 3.0491e-03 1.0 4.43e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 145
PCGAMG Opt l05 1 1.0 8.0217e-04 1.0 6.95e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 87
PCGAMG Gal l06 1 1.0 1.2688e-04 1.0 1.27e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 100
PCGAMG Opt l06 1 1.0 9.1779e-05 1.0 2.96e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 32
PCSetUp 1 1.0 4.8375e+02 1.0 5.82e+10 1.0 0.0e+00 0.0e+00 0.0e+00 45 17 0 0 0 58 22 0 0 0 120
PCApply 13 1.0 1.9865e+02 1.0 1.18e+11 1.0 0.0e+00 0.0e+00 0.0e+00 19 34 0 0 0 24 45 0 0 0 593
--- Event Stage 2: Second Solve
MatMult 160 1.0 5.5173e+01 1.0 3.60e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 10 0 0 0 23 41 0 0 0 652
MatMultAdd 25 1.0 1.4079e+00 1.0 5.61e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatMultTranspose 25 1.0 1.1887e-02 1.0 5.61e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 47
MatSolve 5 1.0 3.1421e-05 1.0 1.94e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 62
MatLUFactorSym 1 1.0 4.1040e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 2.0536e-05 1.0 8.30e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40
MatConvert 5 1.0 9.8007e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0
MatScale 15 1.0 2.0833e+00 1.0 1.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 1 0 0 0 480
MatResidual 25 1.0 7.8182e+00 1.0 5.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 3 6 0 0 0 640
MatAssemblyBegin 26 1.0 4.7507e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 26 1.0 6.1941e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0
MatGetRowIJ 1 1.0 3.5968e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 3.4700e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 5 1.0 1.7509e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 0 0 0 0 0
MatZeroEntries 1 1.0 4.9738e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 5 1.0 3.0948e+00 1.0 2.54e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatMatMultSym 5 1.0 4.4847e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatMatMultNum 5 1.0 3.2775e+00 1.0 2.86e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatPtAPSymbolic 5 1.0 1.2886e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatPtAPNumeric 5 1.0 2.4792e+00 1.0 8.02e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatTrnMatMultSym 1 1.0 3.0547e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatGetSymTrans 5 1.0 1.3340e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 54 1.0 9.7819e+00 1.0 1.30e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 4 15 0 0 0 1329
VecNorm 67 1.0 5.1862e-01 1.0 4.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 5 0 0 0 8870
VecScale 60 1.0 4.1836e-01 1.0 1.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 3825
VecCopy 86 1.0 5.7537e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecSet 76 1.0 3.2515e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecAXPY 11 1.0 6.2773e-01 1.0 1.40e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 2230
VecAYPX 155 1.0 7.3351e+00 1.0 4.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 3 5 0 0 0 614
VecAXPBYCZ 50 1.0 2.7304e+00 1.0 5.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 6 0 0 0 1831
VecMAXPY 64 1.0 1.0069e+01 1.0 1.78e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 4 20 0 0 0 1768
VecPointwiseMult 155 1.0 9.0886e+00 1.0 3.10e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 4 4 0 0 0 341
VecNormalize 60 1.0 7.8309e-01 1.0 4.80e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 5 0 0 0 6130
KSPSetUp 12 1.0 3.8298e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
KSPSolve 1 1.0 8.0642e+01 1.0 4.81e+10 1.0 0.0e+00 0.0e+00 0.0e+00 8 14 0 0 0 34 55 0 0 0 596
KSPGMRESOrthog 54 1.0 1.7151e+01 1.0 2.60e+10 1.0 0.0e+00 0.0e+00 0.0e+00 2 7 0 0 0 7 30 0 0 0 1516
PCGAMGGraph_AGG 5 1.0 2.7212e+01 1.0 1.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 11 1 0 0 0 37
PCGAMGCoarse_AGG 5 1.0 5.3094e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 22 0 0 0 0 0
PCGAMGProl_AGG 5 1.0 6.3314e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0
PCGAMGPOpt_AGG 5 1.0 5.8035e+01 1.0 3.76e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 11 0 0 0 24 43 0 0 0 648
GAMG: createProl 5 1.0 1.4493e+02 1.0 3.86e+10 1.0 0.0e+00 0.0e+00 0.0e+00 14 11 0 0 0 61 44 0 0 0 266
Create Graph 5 1.0 9.8007e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0
Filter Graph 5 1.0 1.6596e+01 1.0 1.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 1 0 0 0 60
MIS/Agg 5 1.0 1.7585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 0 0 0 0 0
SA: col data 5 1.0 6.9388e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SA: frmProl0 5 1.0 2.9984e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
SA: smooth 5 1.0 1.3478e+01 1.0 4.24e+05 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 6 0 0 0 0 0
GAMG: partLevel 5 1.0 3.7679e+00 1.0 8.02e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
PCGAMG Squ l00 1 1.0 3.0548e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
PCGAMG Gal l00 1 1.0 3.7665e+00 1.0 6.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
PCGAMG Opt l00 1 1.0 7.7613e+00 1.0 2.25e+05 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0
PCGAMG Gal l01 1 1.0 7.6855e-04 1.0 1.19e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 155
PCGAMG Opt l01 1 1.0 6.0377e-04 1.0 4.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 71
PCGAMG Gal l02 1 1.0 3.4782e-04 1.0 3.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106
PCGAMG Opt l02 1 1.0 2.2165e-04 1.0 1.33e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 60
PCGAMG Gal l03 1 1.0 1.3189e-04 1.0 1.08e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 82
PCGAMG Opt l03 1 1.0 9.0454e-05 1.0 3.83e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42
PCGAMG Gal l04 1 1.0 6.2573e-05 1.0 3.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 53
PCGAMG Opt l04 1 1.0 4.9897e-05 1.0 1.14e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 23
PCSetUp 1 1.0 1.5265e+02 1.0 3.86e+10 1.0 0.0e+00 0.0e+00 0.0e+00 14 11 0 0 0 64 44 0 0 0 253
PCApply 5 1.0 4.8881e+01 1.0 2.90e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 8 0 0 0 21 33 0 0 0 593
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 1 896 0.
--- Event Stage 1: Original Solve
Container 7 0 0 0.
Matrix 39 23 37853744688 0.
Matrix Coarsen 7 7 4704 0.
Vector 249 182 42116718600 0.
Krylov Solver 16 7 217000 0.
Preconditioner 16 7 6496 0.
Viewer 1 0 0 0.
PetscRandom 7 7 4970 0.
Index Set 12 9 8584 0.
Distributed Mesh 15 7 35896 0.
Star Forest Graph 30 14 16464 0.
Discrete System 15 7 7168 0.
Weak Form 15 7 4648 0.
--- Event Stage 2: Second Solve
Container 5 12 7488 0.
Matrix 28 44 42665589712 0.
Matrix Coarsen 5 5 3360 0.
Vector 156 223 48929583824 0.
Krylov Solver 11 20 200246 0.
Preconditioner 11 20 23064 0.
PetscRandom 5 5 3550 0.
Index Set 8 11 11264 0.
Distributed Mesh 10 18 92304 0.
Star Forest Graph 20 36 42336 0.
Discrete System 10 18 18432 0.
Weak Form 10 18 11952 0.
========================================================================================================================
Average time to get PetscTime(): 3.03611e-08
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_monitor_true_residual
-ksp_type gmres
-log_view
-m 10000
-pc_gamg_use_parallel_coarse_grid_solver
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make
-----------------------------------------
Libraries compiled on 2022-05-25 10:03:14 on head1.hpc
Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core
Using PETSc directory: /home/lida
Using PETSc arch:
-----------------------------------------
Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
-----------------------------------------
Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include
-----------------------------------------
Using C linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc
Using Fortran linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90
Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------
[lida at head1 tutorials]$
-------------- next part --------------
[lida at head1 tutorials]$ export OMP_NUM_THREADS=1
[lida at head1 tutorials]$ mpirun -n 60 ./ex5 -m 10000 -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -pc_type gamg -ksp_type gmres -pc_gamg_use_parallel_coarse_grid_solver
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 60
slots that were requested by the application:
./ex5
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
[lida at head1 tutorials]$ mpirun --oversubscribe -n 60 ./ex5 -m 10000 -ksp_monitor -ksp_monitor_true_residual -ksp_converged_reason -log_view -pc_type gamg -ksp_type gmres -pc_gamg_use_parallel_coarse_grid_solver
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: head1
Device name: i40iw0
Device vendor ID: 0x8086
Device vendor part ID: 14290
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: head1
Local device: i40iw0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
[head1.hpc:19648] 119 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[head1.hpc:19648] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[head1.hpc:19648] 119 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
0 KSP Residual norm 4.355207026627e+09
0 KSP preconditioned resid norm 4.355207026627e+09 true resid norm 1.823690908212e+09 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 1.208748211681e+09
1 KSP preconditioned resid norm 1.208748211681e+09 true resid norm 6.155480046246e+08 ||r(i)||/||b|| 3.375286907736e-01
2 KSP Residual norm 4.517088284383e+08
2 KSP preconditioned resid norm 4.517088284383e+08 true resid norm 3.886324242477e+08 ||r(i)||/||b|| 2.131021339732e-01
3 KSP Residual norm 1.603295575511e+08
3 KSP preconditioned resid norm 1.603295575511e+08 true resid norm 1.620131718656e+08 ||r(i)||/||b|| 8.883806523136e-02
4 KSP Residual norm 5.544067857339e+07
4 KSP preconditioned resid norm 5.544067857339e+07 true resid norm 6.120149859376e+07 ||r(i)||/||b|| 3.355914004843e-02
5 KSP Residual norm 1.925294565414e+07
5 KSP preconditioned resid norm 1.925294565414e+07 true resid norm 2.393555310461e+07 ||r(i)||/||b|| 1.312478611196e-02
6 KSP Residual norm 5.919031529729e+06
6 KSP preconditioned resid norm 5.919031529729e+06 true resid norm 6.600310389533e+06 ||r(i)||/||b|| 3.619204526277e-03
7 KSP Residual norm 2.132637762468e+06
7 KSP preconditioned resid norm 2.132637762468e+06 true resid norm 2.301013174864e+06 ||r(i)||/||b|| 1.261734192183e-03
8 KSP Residual norm 7.288135118024e+05
8 KSP preconditioned resid norm 7.288135118024e+05 true resid norm 8.376703989009e+05 ||r(i)||/||b|| 4.593269589318e-04
9 KSP Residual norm 2.618419345570e+05
9 KSP preconditioned resid norm 2.618419345570e+05 true resid norm 2.924464805008e+05 ||r(i)||/||b|| 1.603596745390e-04
10 KSP Residual norm 9.736460918466e+04
10 KSP preconditioned resid norm 9.736460918466e+04 true resid norm 1.093493729815e+05 ||r(i)||/||b|| 5.996047492975e-05
11 KSP Residual norm 3.616464600646e+04
11 KSP preconditioned resid norm 3.616464600646e+04 true resid norm 4.287951581559e+04 ||r(i)||/||b|| 2.351249086262e-05
time 13.250000 m=10000 n=10000
Linear solve converged due to CONVERGED_RTOL iterations 11
time 10.790000 m=10000 n=10000
time 14.340000 m=10000 n=10000
time 11.910000 m=10000 n=10000
time 12.670000 m=10000 n=10000
time 14.900000 m=10000 n=10000
time 13.860000 m=10000 n=10000
time 12.550000 m=10000 n=10000
time 12.610000 m=10000 n=10000
time 11.110000 m=10000 n=10000
time 11.640000 m=10000 n=10000
time 11.980000 m=10000 n=10000
time 14.820000 m=10000 n=10000
time 16.180000 m=10000 n=10000
time 14.330000 m=10000 n=10000
time 13.850000 m=10000 n=10000
time 13.220000 m=10000 n=10000
time 10.220000 m=10000 n=10000
time 12.680000 m=10000 n=10000
time 16.240000 m=10000 n=10000
time 12.490000 m=10000 n=10000
time 16.070000 m=10000 n=10000
time 12.870000 m=10000 n=10000
time 12.170000 m=10000 n=10000
time 15.960000 m=10000 n=10000
time 13.630000 m=10000 n=10000
time 11.530000 m=10000 n=10000
time 13.700000 m=10000 n=10000
time 14.360000 m=10000 n=10000
time 11.690000 m=10000 n=10000
time 13.610000 m=10000 n=10000
time 12.800000 m=10000 n=10000
time 10.350000 m=10000 n=10000
time 14.680000 m=10000 n=10000
time 12.640000 m=10000 n=10000
time 10.860000 m=10000 n=10000
time 13.650000 m=10000 n=10000
time 14.190000 m=10000 n=10000
time 12.620000 m=10000 n=10000
time 12.860000 m=10000 n=10000
time 13.640000 m=10000 n=10000
time 14.790000 m=10000 n=10000
time 11.720000 m=10000 n=10000
time 13.300000 m=10000 n=10000
time 12.990000 m=10000 n=10000
time 13.100000 m=10000 n=10000
time 14.630000 m=10000 n=10000
time 14.170000 m=10000 n=10000
time 13.830000 m=10000 n=10000
time 12.600000 m=10000 n=10000
time 12.500000 m=10000 n=10000
time 12.050000 m=10000 n=10000
time 13.430000 m=10000 n=10000
time 11.790000 m=10000 n=10000
time 12.900000 m=10000 n=10000
time 11.200000 m=10000 n=10000
time 14.120000 m=10000 n=10000
time 15.230000 m=10000 n=10000
time 14.020000 m=10000 n=10000
time 13.360000 m=10000 n=10000
Norm of error 126115., Iterations 11
0 KSP Residual norm 1.051609452779e+09
0 KSP preconditioned resid norm 1.051609452779e+09 true resid norm 2.150197987965e+09 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 1.140665240186e+07
1 KSP preconditioned resid norm 1.140665240186e+07 true resid norm 7.877021908575e+07 ||r(i)||/||b|| 3.663393767766e-02
2 KSP Residual norm 9.303562428258e+05
2 KSP preconditioned resid norm 9.303562428258e+05 true resid norm 5.522945877123e+06 ||r(i)||/||b|| 2.568575502366e-03
3 KSP Residual norm 7.562974008642e+04
3 KSP preconditioned resid norm 7.562974008642e+04 true resid norm 4.308267545679e+05 ||r(i)||/||b|| 2.003660858113e-04
4 KSP Residual norm 6.241321855425e+03
4 KSP preconditioned resid norm 6.241321855425e+03 true resid norm 3.569774197924e+04 ||r(i)||/||b|| 1.660207207850e-05
Linear solve converged due to CONVERGED_RTOL iterations 4
Norm of error 9.5902e+09, Iterations 4
**************************************** ***********************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------
./ex5 on a named head1.hpc with 60 processors, by lida Wed Jun 1 20:39:05 2022
Using Petsc Release Version 3.17.1, unknown
Max Max/Min Avg Total
Time (sec): 8.038e+01 1.001 8.036e+01
Objects: 1.450e+03 1.000 1.450e+03
Flops: 5.486e+09 1.002 5.485e+09 3.291e+11
Flops/sec: 6.827e+07 1.003 6.825e+07 4.095e+09
MPI Msg Count: 3.320e+03 2.712 2.535e+03 1.521e+05
MPI Msg Len (bytes): 5.412e+07 1.926 2.092e+04 3.183e+09
MPI Reductions: 1.547e+03 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 2.1349e-01 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.000e+00 0.1%
1: Original Solve: 5.8264e+01 72.5% 2.4065e+11 73.1% 1.086e+05 71.4% 2.402e+04 82.0% 9.080e+02 58.7%
2: Second Solve: 2.1880e+01 27.2% 8.8426e+10 26.9% 4.348e+04 28.6% 1.318e+04 18.0% 6.190e+02 40.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
--- Event Stage 1: Original Solve
BuildTwoSided 99 1.0 5.4267e+00 1.8 0.00e+00 0.0 5.5e+03 8.0e+00 9.9e+01 5 0 4 0 6 7 0 5 0 11 0
BuildTwoSidedF 59 1.0 4.8310e+00 2.2 0.00e+00 0.0 2.8e+03 7.3e+04 5.9e+01 4 0 2 6 4 6 0 3 8 6 0
MatMult 430 1.0 1.0317e+01 1.3 1.64e+09 1.0 5.2e+04 2.8e+04 7.0e+00 11 30 34 46 0 16 41 48 56 1 9505
MatMultAdd 84 1.0 1.1148e+00 1.8 1.23e+08 1.0 7.7e+03 8.8e+03 0.0e+00 1 2 5 2 0 2 3 7 3 0 6638
MatMultTranspose 84 1.0 1.9097e+00 1.7 1.24e+08 1.0 9.0e+03 8.2e+03 7.0e+00 2 2 6 2 0 2 3 8 3 1 3880
MatSolve 12 1.0 8.0690e-05 3.1 3.36e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 250
MatLUFactorSym 1 1.0 6.1153e-05 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 5.7829e-05 5.0 3.60e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37
MatConvert 8 1.0 7.6993e-01 1.9 0.00e+00 0.0 1.6e+03 1.3e+04 7.0e+00 1 0 1 1 0 1 0 1 1 1 0
MatScale 21 1.0 3.2515e-01 2.0 3.42e+07 1.0 8.1e+02 2.6e+04 0.0e+00 0 1 1 1 0 0 1 1 1 0 6310
MatResidual 84 1.0 1.8980e+00 1.5 2.87e+08 1.0 9.8e+03 2.6e+04 0.0e+00 2 5 6 8 0 3 7 9 10 0 9073
MatAssemblyBegin 117 1.0 4.0990e+00 2.4 0.00e+00 0.0 2.8e+03 7.3e+04 4.0e+01 3 0 2 6 3 5 0 3 8 4 0
MatAssemblyEnd 117 1.0 6.4010e+00 1.4 1.21e+05 2.3 0.0e+00 0.0e+00 1.4e+02 7 0 0 0 9 9 0 0 0 15 1
MatGetRowIJ 1 1.0 5.1278e-0513.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 1 1.0 2.7949e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 4 1.0 1.4416e-01 1.1 0.00e+00 0.0 2.9e+02 1.4e+03 5.6e+01 0 0 0 0 4 0 0 0 0 6 0
MatGetOrdering 1 1.0 1.2932e-02469.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 7 1.0 1.8915e+00 1.2 0.00e+00 0.0 2.0e+04 8.6e+03 9.5e+01 2 0 13 5 6 3 0 19 7 10 0
MatZeroEntries 7 1.0 2.4975e-02 6.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 7 1.0 1.1375e+00 1.2 1.95e+06 1.0 0.0e+00 0.0e+00 7.0e+00 1 0 0 0 0 2 0 0 0 1 103
MatTranspose 14 1.0 3.0706e-01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMultSym 21 1.0 3.4042e+00 1.1 0.00e+00 0.0 2.4e+03 2.6e+04 6.3e+01 4 0 2 2 4 5 0 2 2 7 0
MatMatMultNum 21 1.0 1.7535e+00 1.6 8.62e+07 1.0 8.1e+02 2.6e+04 7.0e+00 2 2 1 1 0 2 2 1 1 1 2946
MatPtAPSymbolic 7 1.0 3.8899e+00 1.0 0.00e+00 0.0 5.3e+03 4.9e+04 4.9e+01 5 0 3 8 3 7 0 5 10 5 0
MatPtAPNumeric 7 1.0 2.3929e+00 1.1 1.34e+08 1.0 1.9e+03 1.0e+05 3.5e+01 3 2 1 6 2 4 3 2 7 4 3361
MatTrnMatMultSym 1 1.0 5.1090e+00 1.0 0.00e+00 0.0 3.5e+02 1.9e+05 1.1e+01 6 0 0 2 1 9 0 0 3 1 0
MatRedundantMat 1 1.0 2.8010e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMPIConcateSeq 1 1.0 3.4418e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetLocalMat 22 1.0 1.1056e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatGetBrAoCol 21 1.0 2.0194e-01 2.8 0.00e+00 0.0 5.7e+03 4.7e+04 0.0e+00 0 0 4 8 0 0 0 5 10 0 0
MatGetSymTrans 2 1.0 2.8332e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 81 1.0 3.5909e+00 1.5 4.34e+08 1.0 0.0e+00 0.0e+00 8.1e+01 4 8 0 0 5 5 11 0 0 9 7248
VecNorm 103 1.0 2.5986e+00 1.5 1.29e+08 1.0 0.0e+00 0.0e+00 1.0e+02 3 2 0 0 7 4 3 0 0 11 2988
VecScale 89 1.0 1.6507e-01 1.8 4.14e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15040
VecCopy 272 1.0 5.8546e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecSet 315 1.0 3.5993e-01 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 20 1.0 1.9678e-01 2.2 4.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 14398
VecAYPX 516 1.0 8.4424e-01 1.9 2.07e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 5 0 0 0 14681
VecAXPBYCZ 168 1.0 3.6343e-01 2.1 2.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 6 0 0 0 38501
VecMAXPY 100 1.0 1.6303e+00 1.4 7.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 13 0 0 0 2 18 0 0 0 26841
VecAssemblyBegin 21 1.0 7.7428e-01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+01 1 0 0 0 1 1 0 0 0 2 0
VecAssemblyEnd 21 1.0 8.0603e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 413 1.0 8.0314e-01 1.5 1.15e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 3 0 0 0 8566
VecScatterBegin 650 1.0 5.5847e-01 2.3 0.00e+00 0.0 7.4e+04 2.4e+04 2.6e+01 1 0 49 57 2 1 0 68 69 3 0
VecScatterEnd 650 1.0 6.7772e+00 2.7 1.47e+05 2.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 8 0 0 0 0 1
VecNormalize 89 1.0 1.7682e+00 1.4 1.24e+08 1.0 0.0e+00 0.0e+00 8.9e+01 2 2 0 0 6 3 3 0 0 10 4212
SFSetGraph 52 1.0 4.5486e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 40 1.0 1.0026e+00 2.2 0.00e+00 0.0 8.3e+03 1.4e+04 4.0e+01 1 0 5 4 3 1 0 8 4 4 0
SFBcastBegin 102 1.0 1.1024e-0214.9 0.00e+00 0.0 1.9e+04 7.7e+03 0.0e+00 0 0 12 5 0 0 0 17 6 0 0
SFBcastEnd 102 1.0 4.4157e-0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFPack 752 1.0 3.4799e-02 8.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 752 1.0 1.5714e-03 2.5 1.47e+05 2.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5420
KSPSetUp 17 1.0 4.5984e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 1 1 0 0 0 2 0
KSPSolve 1 1.0 1.7258e+01 1.0 3.02e+09 1.0 5.9e+04 2.3e+04 5.9e+01 21 55 39 43 4 30 75 55 52 6 10498
KSPGMRESOrthog 81 1.0 4.4162e+00 1.4 8.68e+08 1.0 0.0e+00 0.0e+00 8.1e+01 5 16 0 0 5 6 22 0 0 9 11787
PCGAMGGraph_AGG 7 1.0 6.6074e+00 1.0 2.39e+07 1.0 2.4e+03 1.7e+04 6.3e+01 8 0 2 1 4 11 1 2 2 7 217
PCGAMGCoarse_AGG 7 1.0 9.2609e+00 1.0 0.00e+00 0.0 2.2e+04 1.6e+04 1.2e+02 11 0 14 11 8 16 0 20 14 13 0
PCGAMGProl_AGG 7 1.0 3.7914e+00 1.1 0.00e+00 0.0 3.9e+03 2.2e+04 1.1e+02 5 0 3 3 7 6 0 4 3 12 0
PCGAMGPOpt_AGG 7 1.0 9.7580e+00 1.0 8.12e+08 1.0 1.3e+04 2.4e+04 2.9e+02 12 15 8 10 19 17 20 12 12 32 4991
GAMG: createProl 7 1.0 2.9629e+01 1.0 8.36e+08 1.0 4.1e+04 1.9e+04 5.8e+02 37 15 27 25 37 51 21 38 30 64 1692
Create Graph 7 1.0 7.6994e-01 1.9 0.00e+00 0.0 1.6e+03 1.3e+04 7.0e+00 1 0 1 1 0 1 0 1 1 1 0
Filter Graph 7 1.0 5.9877e+00 1.1 2.39e+07 1.0 8.1e+02 2.6e+04 5.6e+01 7 0 1 1 4 10 1 1 1 6 240
MIS/Agg 7 1.0 1.8917e+00 1.2 0.00e+00 0.0 2.0e+04 8.6e+03 9.5e+01 2 0 13 5 6 3 0 19 7 10 0
SA: col data 7 1.0 8.5757e-01 1.1 0.00e+00 0.0 3.0e+03 2.4e+04 4.8e+01 1 0 2 2 3 1 0 3 3 5 0
SA: frmProl0 7 1.0 2.5043e+00 1.0 0.00e+00 0.0 9.2e+02 1.5e+04 3.5e+01 3 0 1 0 2 4 0 1 1 4 0
SA: smooth 7 1.0 4.9217e+00 1.0 3.62e+07 1.0 3.3e+03 2.6e+04 9.4e+01 6 1 2 3 6 8 1 3 3 10 441
GAMG: partLevel 7 1.0 6.8353e+00 1.0 1.34e+08 1.0 7.9e+03 5.7e+04 1.9e+02 8 2 5 14 12 12 3 7 17 21 1177
repartition 2 1.0 4.0880e-01 1.0 0.00e+00 0.0 7.0e+02 6.2e+02 1.1e+02 1 0 0 0 7 1 0 1 0 12 0
Invert-Sort 2 1.0 6.3940e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
Move A 2 1.0 1.2997e-01 1.1 0.00e+00 0.0 2.9e+02 1.4e+03 3.0e+01 0 0 0 0 2 0 0 0 0 3 0
Move P 2 1.0 2.4461e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.2e+01 0 0 0 0 2 0 0 0 0 4 0
PCGAMG Squ l00 1 1.0 5.1090e+00 1.0 0.00e+00 0.0 3.5e+02 1.9e+05 1.1e+01 6 0 0 2 1 9 0 0 3 1 0
PCGAMG Gal l00 1 1.0 3.9555e+00 1.1 7.69e+07 1.0 9.4e+02 1.5e+05 1.3e+01 5 1 1 4 1 7 2 1 5 1 1167
PCGAMG Opt l00 1 1.0 2.4544e+00 1.1 1.67e+07 1.0 4.7e+02 8.0e+04 1.1e+01 3 0 0 1 1 4 0 0 1 1 407
PCGAMG Gal l01 1 1.0 1.6669e+00 1.0 3.97e+07 1.0 9.4e+02 1.8e+05 1.3e+01 2 1 1 5 1 3 1 1 6 1 1428
PCGAMG Opt l01 1 1.0 5.6540e-01 1.0 5.09e+06 1.0 4.7e+02 5.0e+04 1.1e+01 1 0 0 1 1 1 0 0 1 1 540
PCGAMG Gal l02 1 1.0 4.9954e-01 1.0 1.54e+07 1.1 9.4e+02 1.1e+05 1.3e+01 1 0 1 3 1 1 0 1 4 1 1829
PCGAMG Opt l02 1 1.0 2.3346e-01 1.1 1.91e+06 1.0 4.7e+02 3.2e+04 1.1e+01 0 0 0 0 1 0 0 0 1 1 488
PCGAMG Gal l03 1 1.0 2.1878e-01 1.0 2.19e+06 1.4 9.4e+02 3.6e+04 1.2e+01 0 0 1 1 1 0 0 1 1 1 567
PCGAMG Opt l03 1 1.0 1.3572e-01 1.1 2.49e+05 1.1 4.7e+02 1.1e+04 1.0e+01 0 0 0 0 1 0 0 0 0 1 108
PCGAMG Gal l04 1 1.0 1.1188e-01 1.2 1.95e+05 2.2 2.4e+03 3.6e+03 1.2e+01 0 0 2 0 1 0 0 2 0 1 93
PCGAMG Opt l04 1 1.0 1.3686e-01 1.1 2.16e+04 1.7 9.4e+02 1.6e+03 1.0e+01 0 0 1 0 1 0 0 1 0 1 9
PCGAMG Gal l05 1 1.0 3.8335e-03 1.4 3.21e+04 0.0 9.5e+02 5.7e+02 1.2e+01 0 0 1 0 1 0 0 1 0 1 133
PCGAMG Opt l05 1 1.0 1.3979e-02 1.1 4.02e+03 0.0 4.3e+02 2.7e+02 1.0e+01 0 0 0 0 1 0 0 0 0 1 5
PCGAMG Gal l06 1 1.0 2.1886e-03 1.5 8.55e+03 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 4
PCGAMG Opt l06 1 1.0 1.5361e-02 1.0 2.63e+03 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 0 0 0 0 1 0
PCSetUp 1 1.0 3.7111e+01 1.0 9.70e+08 1.0 4.9e+04 2.5e+04 8.3e+02 46 18 32 39 54 64 24 45 47 92 1568
PCApply 12 1.0 1.1146e+01 1.1 1.82e+09 1.0 5.7e+04 2.0e+04 2.3e+01 13 33 37 36 1 18 45 52 44 3 9768
--- Event Stage 2: Second Solve
BuildTwoSided 73 1.0 1.9759e+00 1.9 0.00e+00 0.0 6.8e+03 8.0e+00 7.3e+01 2 0 4 0 5 7 0 16 0 12 0
BuildTwoSidedF 44 1.0 1.2189e+00 2.0 0.00e+00 0.0 8.6e+03 1.7e+04 4.4e+01 1 0 6 5 3 4 0 20 25 7 0
MatMult 160 1.0 4.2652e+00 1.4 6.02e+08 1.0 1.6e+04 2.3e+04 4.0e+00 5 11 11 12 0 17 41 38 67 1 8474
MatMultAdd 25 1.0 1.8994e-01 3.6 6.17e+05 1.0 2.1e+03 1.7e+02 0.0e+00 0 0 1 0 0 0 0 5 0 0 193
MatMultTranspose 25 1.0 7.2792e-0119.4 6.18e+05 1.0 2.9e+03 1.5e+02 5.0e+00 0 0 2 0 0 1 0 7 0 1 51
MatSolve 5 1.0 3.6131e-05 1.8 1.99e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3305
MatLUFactorSym 1 1.0 9.1382e-05 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 3.8696e-05 5.9 9.09e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1409
MatConvert 6 1.0 4.6379e-01 2.2 0.00e+00 0.0 9.6e+02 1.0e+04 5.0e+00 0 0 1 0 0 1 0 2 2 1 0
MatScale 15 1.0 2.0882e-01 2.2 1.69e+07 1.0 4.8e+02 2.0e+04 0.0e+00 0 0 0 0 0 1 1 1 2 0 4848
MatResidual 25 1.0 6.9403e-01 2.1 8.38e+07 1.0 2.4e+03 2.0e+04 0.0e+00 1 2 2 2 0 2 6 6 8 0 7240
MatAssemblyBegin 87 1.0 8.9302e-01 2.2 0.00e+00 0.0 8.6e+03 1.7e+04 3.0e+01 1 0 6 5 2 3 0 20 25 5 0
MatAssemblyEnd 87 1.0 2.6614e+00 1.3 9.98e+04 1.0 0.0e+00 0.0e+00 1.0e+02 3 0 0 0 7 11 0 0 0 16 2
MatGetRowIJ 1 1.0 1.2436e-021825.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 1 1.0 5.4925e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 1 0
MatCreateSubMat 4 1.0 3.1275e-01 1.1 0.00e+00 0.0 3.0e+02 2.0e+02 5.6e+01 0 0 0 0 4 1 0 1 0 9 0
MatGetOrdering 1 1.0 1.2514e-02516.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 5 1.0 1.0904e+00 1.8 0.00e+00 0.0 2.9e+03 6.3e+02 1.5e+01 1 0 2 0 1 4 0 7 0 2 0
MatZeroEntries 6 1.0 5.8985e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 5 1.0 4.8172e-01 1.2 2.35e+04 1.0 0.0e+00 0.0e+00 5.0e+00 1 0 0 0 0 2 0 0 0 1 3
MatTranspose 10 1.0 3.2074e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMultSym 15 1.0 1.1367e+00 1.0 0.00e+00 0.0 1.4e+03 7.0e+03 4.5e+01 1 0 1 0 3 5 0 3 2 7 0
MatMatMultNum 15 1.0 2.4953e-01 1.2 1.02e+06 1.0 4.8e+02 5.1e+02 5.0e+00 0 0 0 0 0 1 0 1 0 1 241
MatPtAPSymbolic 5 1.0 8.1101e-01 1.1 0.00e+00 0.0 2.9e+03 4.2e+03 3.5e+01 1 0 2 0 2 4 0 7 2 6 0
MatPtAPNumeric 5 1.0 3.9561e-01 1.2 1.58e+06 1.0 9.6e+02 2.2e+03 2.5e+01 0 0 1 0 2 2 0 2 0 4 235
MatTrnMatMultSym 1 1.0 9.9313e-01 1.0 0.00e+00 0.0 3.5e+02 2.2e+03 1.1e+01 1 0 0 0 1 4 0 1 0 2 0
MatRedundantMat 1 1.0 5.4990e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 1 0
MatMPIConcateSeq 1 1.0 5.8523e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetLocalMat 16 1.0 5.0405e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatGetBrAoCol 15 1.0 1.0643e-01 2.6 0.00e+00 0.0 3.4e+03 6.5e+03 0.0e+00 0 0 2 1 0 0 0 8 4 0 0
MatGetSymTrans 2 1.0 7.1719e-02 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 54 1.0 1.5885e+00 1.7 2.17e+08 1.0 0.0e+00 0.0e+00 5.4e+01 2 4 0 0 3 6 15 0 0 9 8198
VecNorm 67 1.0 1.6017e+00 1.4 7.67e+07 1.0 0.0e+00 0.0e+00 6.7e+01 2 1 0 0 4 6 5 0 0 11 2875
VecScale 60 1.0 1.0996e-01 1.7 2.67e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 14571
VecCopy 86 1.0 2.5385e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecSet 106 1.0 1.8434e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 11 1.0 9.2483e-02 2.2 2.33e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 15142
VecAYPX 155 1.0 3.6418e-01 2.3 7.51e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 5 0 0 0 12379
VecAXPBYCZ 50 1.0 1.6760e-01 2.9 8.35e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 6 0 0 0 29894
VecMAXPY 64 1.0 6.9542e-01 1.6 2.97e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 3 20 0 0 0 25634
VecAssemblyBegin 16 1.0 5.5466e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 1 2 0 0 0 2 0
VecAssemblyEnd 16 1.0 7.5968e-05 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 155 1.0 3.8336e-01 1.8 5.18e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 4 0 0 0 8103
VecScatterBegin 242 1.0 3.2776e-01 2.7 0.00e+00 0.0 2.5e+04 1.6e+04 1.9e+01 0 0 17 12 1 1 0 58 69 3 0
VecScatterEnd 242 1.0 2.3472e+00 2.9 1.22e+03 3.5 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 0 0 0 0 0
VecNormalize 60 1.0 1.3091e+00 1.6 8.01e+07 1.0 0.0e+00 0.0e+00 6.0e+01 1 1 0 0 4 5 5 0 0 10 3672
SFSetGraph 39 1.0 8.6654e-04 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 29 1.0 8.1719e-01 2.8 0.00e+00 0.0 4.9e+03 2.2e+03 2.9e+01 1 0 3 0 2 3 0 11 2 5 0
SFBcastBegin 20 1.0 2.9462e-04 2.5 0.00e+00 0.0 1.9e+03 7.5e+02 0.0e+00 0 0 1 0 0 0 0 4 0 0 0
SFBcastEnd 20 1.0 2.5962e-01143.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFPack 262 1.0 1.7704e-02100.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 262 1.0 1.3156e-02131.3 1.22e+03 3.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3
KSPSetUp 13 1.0 1.7844e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 1 1 0 0 0 2 0
KSPSolve 1 1.0 5.4889e+00 1.0 8.05e+08 1.0 1.6e+04 1.7e+04 3.2e+01 7 15 11 9 2 25 55 37 48 5 8798
KSPGMRESOrthog 54 1.0 1.9637e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 5.4e+01 2 8 0 0 3 7 29 0 0 9 13263
PCGAMGGraph_AGG 5 1.0 2.2872e+00 1.0 1.68e+07 1.0 1.4e+03 1.3e+04 4.5e+01 3 0 1 1 3 10 1 3 3 7 439
PCGAMGCoarse_AGG 5 1.0 3.2731e+00 1.0 0.00e+00 0.0 4.3e+03 9.2e+02 3.7e+01 4 0 3 0 2 15 0 10 1 6 0
PCGAMGProl_AGG 5 1.0 1.2727e+00 1.0 0.00e+00 0.0 2.3e+03 4.5e+02 7.9e+01 2 0 1 0 5 6 0 5 0 13 0
PCGAMGPOpt_AGG 5 1.0 5.7655e+00 1.0 6.29e+08 1.0 7.4e+03 1.4e+04 2.0e+02 7 11 5 3 13 26 43 17 19 33 6544
GAMG: createProl 5 1.0 1.2524e+01 1.0 6.46e+08 1.0 1.5e+04 8.4e+03 3.7e+02 16 12 10 4 24 57 44 36 23 59 3093
Create Graph 5 1.0 4.6379e-01 2.2 0.00e+00 0.0 9.6e+02 1.0e+04 5.0e+00 0 0 1 0 0 1 0 2 2 1 0
Filter Graph 5 1.0 2.0622e+00 1.1 1.68e+07 1.0 4.8e+02 2.0e+04 4.0e+01 2 0 0 0 3 9 1 1 2 6 487
MIS/Agg 5 1.0 1.0905e+00 1.8 0.00e+00 0.0 2.9e+03 6.3e+02 1.5e+01 1 0 2 0 1 4 0 7 0 2 0
SA: col data 5 1.0 5.3167e-01 1.2 0.00e+00 0.0 1.7e+03 5.1e+02 3.4e+01 1 0 1 0 2 2 0 4 0 5 0
SA: frmProl0 5 1.0 5.6108e-01 1.1 0.00e+00 0.0 6.0e+02 2.8e+02 2.5e+01 1 0 0 0 2 2 0 1 0 4 0
SA: smooth 5 1.0 2.0259e+00 1.1 4.34e+05 1.0 1.9e+03 5.4e+03 6.6e+01 2 0 1 0 4 9 0 4 2 11 13
GAMG: partLevel 5 1.0 1.8885e+00 1.0 1.58e+06 1.0 4.5e+03 3.2e+03 1.7e+02 2 0 3 0 11 9 0 10 3 27 49
repartition 2 1.0 7.2098e-01 1.0 0.00e+00 0.0 6.7e+02 1.0e+02 1.1e+02 1 0 0 0 7 3 0 2 0 17 0
Invert-Sort 2 1.0 7.2351e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 2 0
Move A 2 1.0 1.6566e-01 1.2 0.00e+00 0.0 3.0e+02 2.0e+02 3.0e+01 0 0 0 0 2 1 0 1 0 5 0
Move P 2 1.0 1.9534e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.2e+01 0 0 0 0 2 1 0 0 0 5 0
PCGAMG Squ l00 1 1.0 9.9313e-01 1.0 0.00e+00 0.0 3.5e+02 2.2e+03 1.1e+01 1 0 0 0 1 4 0 1 0 2 0
PCGAMG Gal l00 1 1.0 7.0349e-01 1.0 9.22e+05 1.0 9.4e+02 1.2e+04 1.3e+01 1 0 1 0 1 3 0 2 2 2 78
PCGAMG Opt l00 1 1.0 1.1336e+00 1.1 2.00e+05 1.0 4.7e+02 2.1e+04 1.1e+01 1 0 0 0 1 5 0 1 2 2 11
PCGAMG Gal l01 1 1.0 1.3150e-01 1.1 4.76e+05 1.1 9.4e+02 2.1e+03 1.2e+01 0 0 1 0 1 1 0 2 0 2 210
PCGAMG Opt l01 1 1.0 1.3884e-01 1.1 6.11e+04 1.0 4.7e+02 6.0e+02 1.0e+01 0 0 0 0 1 1 0 1 0 2 26
PCGAMG Gal l02 1 1.0 1.7012e-01 1.1 1.76e+05 1.3 9.4e+02 1.2e+03 1.2e+01 0 0 1 0 1 1 0 2 0 2 57
PCGAMG Opt l02 1 1.0 3.7929e-02 1.0 2.31e+04 1.1 4.7e+02 3.9e+02 1.0e+01 0 0 0 0 1 0 0 1 0 2 35
PCGAMG Gal l03 1 1.0 1.2656e-01 1.1 1.65e+04 1.6 9.4e+02 2.8e+02 1.2e+01 0 0 1 0 1 1 0 2 0 2 7
PCGAMG Opt l03 1 1.0 1.6197e-03 1.3 2.75e+03 1.4 4.7e+02 1.4e+02 1.0e+01 0 0 0 0 1 0 0 1 0 2 90
PCGAMG Gal l04 1 1.0 7.2816e-02 1.4 4.91e+03 0.0 6.4e+01 6.3e+01 1.2e+01 0 0 0 0 1 0 0 0 0 2 0
PCGAMG Opt l04 1 1.0 1.1088e-01 1.2 1.50e+03 0.0 3.2e+01 5.5e+01 1.0e+01 0 0 0 0 1 0 0 0 0 2 0
PCSetUp 1 1.0 1.5702e+01 1.0 6.47e+08 1.0 2.0e+04 7.2e+03 5.8e+02 19 12 13 5 38 72 44 46 25 94 2473
PCApply 5 1.0 3.6198e+00 1.2 4.87e+08 1.0 1.5e+04 1.3e+04 1.7e+01 4 9 10 6 1 15 33 35 34 3 8064
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 1 896 0.
--- Event Stage 1: Original Solve
Container 14 4 2496 0.
Matrix 195 108 1577691960 0.
Matrix Coarsen 7 7 4704 0.
Vector 352 266 734665864 0.
Index Set 110 97 704896 0.
Star Forest Graph 82 49 63224 0.
Krylov Solver 17 7 217000 0.
Preconditioner 17 7 6496 0.
Viewer 1 0 0 0.
PetscRandom 7 7 4970 0.
Distributed Mesh 15 7 35896 0.
Discrete System 15 7 7168 0.
Weak Form 15 7 4648 0.
--- Event Stage 2: Second Solve
Container 10 20 12480 0.
Matrix 144 231 1909487728 0.
Matrix Coarsen 5 5 3360 0.
Vector 237 323 870608184 0.
Index Set 88 101 154152 0.
Star Forest Graph 59 92 117152 0.
Krylov Solver 12 22 203734 0.
Preconditioner 12 22 25080 0.
PetscRandom 5 5 3550 0.
Distributed Mesh 10 18 92304 0.
Discrete System 10 18 18432 0.
Weak Form 10 18 11952 0.
========================================================================================================================
Average time to get PetscTime(): 3.64147e-08
Average time for MPI_Barrier(): 0.00739469
Average time for zero size MPI_Send(): 0.000168472
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_monitor_true_residual
-ksp_type gmres
-log_view
-m 10000
-pc_gamg_use_parallel_coarse_grid_solver
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-python --prefix=/home/lida -with-mpi-dir=/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4 LDFLAGS="-L/home/lida/lib64 -L/home/lida/lib -L/home/lida/jdk/lib" CPPFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CXXFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" CFLAGS="-I/home/lida/include -I/home/lida/jdk/include -march=native -O3" --with-debugging=no --with-64-bit-indices FOPTFLAGS="-O3 -march=native" --download-make
-----------------------------------------
Libraries compiled on 2022-05-25 10:03:14 on head1.hpc
Machine characteristics: Linux-3.10.0-1062.el7.x86_64-x86_64-with-centos-7.7.1908-Core
Using PETSc directory: /home/lida
Using PETSc arch:
-----------------------------------------
Using C compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc -I/home/lida/include -I/home/lida/jdk/include -march=native -O3 -fPIC -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
Using Fortran compiler: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 -march=native -I/home/lida/include -I/home/lida/jdk/include -march=native -O3
-----------------------------------------
Using include paths: -I/home/lida/include -I/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/include
-----------------------------------------
Using C linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpicc
Using Fortran linker: /opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/bin/mpif90
Using libraries: -Wl,-rpath,/home/lida/lib -L/home/lida/lib -lpetsc -Wl,-rpath,/home/lida/lib64 -L/home/lida/lib64 -Wl,-rpath,/home/lida/lib -L/home/lida/lib -Wl,-rpath,/home/lida/jdk/lib -L/home/lida/jdk/lib -Wl,-rpath,/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -L/opt/ohpc/pub/mpi/openmpi3-gnu8/3.1.4/lib -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib64 -Wl,-rpath,/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -L/home/lida/intel/oneapi/mkl/2022.0.2/lib/intel64 -Wl,-rpath,/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -L/opt/software/intel/compilers_and_libraries_2020.2.254/linux/tbb/lib/intel64/gcc4.8 -Wl,-rpath,/opt/ohpc/pub/compiler/gcc/8.3.0/lib -L/opt/ohpc/pub/compiler/gcc/8.3.0/lib -lopenblas -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl
-----------------------------------------
[lida at head1 tutorials]$
More information about the petsc-users
mailing list