[petsc-users] PETSc (3.9.0) GAMG weak scaling test issue
"Alberto F. Martín"
amartin at cimne.upc.edu
Wed Nov 7 10:02:15 CST 2018
Dear All,
we are performing a weak scaling test of the PETSc (v3.9.0) GAMG
preconditioner when applied to the linear system arising
from the *conforming unfitted FE discretization *(using Q1 Lagrangian
FEs) of a 3D PDE Poisson problem, where
the boundary of the domain (a popcorn flake) is described as a
zero-level-set embedded within a uniform background
(Cartesian-like) hexahedral mesh. Details underlying the FEM formulation
can be made available on demand if you
believe that this might be helpful, but let me just point out that it is
designed such that it addresses the well-known
ill-conditioning issues of unfitted FE discretizations due to the small
cut cell problem.
The weak scaling test is set up as follows. We start from a single cube
background mesh, and refine it uniformly several
steps, until we have approximately either 10**3 (load1), 20**3 (load2),
or 40**3 (load3) hexahedra/MPI task when
distributing it over 4 MPI tasks. The benchmark is scaled such that the
next larger scale problem to be tested is obtained
by uniformly refining the mesh from the previous scale and running it on
8x times the number of MPI tasks that we used
in the previous scale. As a result, we obtain three weak scaling curves
for each of the three fixed loads per MPI task
above, on the following total number of MPI tasks: 4, 32, 262, 2097,
16777. The underlying mesh is not partitioned among
MPI tasks using ParMETIS (unstructured multilevel graph partitioning)
nor optimally by hand, but following the so-called
z-shape space-filling curves provided by an underlying octree-like mesh
handler (i.e., p4est library).
I configured the preconditioned linear solver as follows:
-ksp_type cg
-ksp_monitor
-ksp_rtol 1.0e-6
-ksp_converged_reason
-ksp_max_it 500
-ksp_norm_type unpreconditioned
-ksp_view
-log_view
-pc_type gamg
-pc_gamg_type agg
-mg_levels_esteig_ksp_type cg
-mg_coarse_sub_pc_type cholesky
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_agg_nsmooths 1
Raw timings (in seconds) of the preconditioner set up and PCG iterative
solution stage, and number of iterations are as follows:
**preconditioner set up**
(load1): [0.02542160451, 0.05169247743, 0.09266782179, 0.2426272957,
13.64161944]
(load2): [0.1239175797 , 0.1885528499 , 0.2719282564 , 0.4783878336,
13.37947339]
(load3): [0.6565349903 , 0.9435049873 , 1.299908397 , 1.916243652
, 16.02904088]
**PCG stage**
(load1): [0.003287350759, 0.008163803257, 0.03565631993, 0.08343045413,
0.6937994603]
(load2): [0.0205939794 , 0.03594723623 , 0.07593298424,
0.1212046621 , 0.6780373845]
(load3): [0.1310882876 , 0.3214917686 , 0.5532023879 ,
0.766881627 , 1.485446003]
**number of PCG iterations**
(load1): [5, 8, 11, 13, 13]
(load2): [7, 10, 12, 13, 13]
(load3): [8, 10, 12, 13, 13]
It can be observed that both the number of linear solver iterations and
the PCG stage timings (weakly)
scale remarkably, but t*here is a significant time increase when scaling
the problem from 2097 to 16777 MPI tasks **
**for the preconditioner setup stage* (e.g., 1.916243652 vs 16.02904088
sec. with 40**3 cells per MPI task).
I gathered the combined output of -ksp_view and -log_view (only) for all
the points involving the load3 weak scaling
test (find them attached to this message). Please note that within each
run, I execute the these two stages up-to
three times, and this influences absolute timings given in -log_view.
Looking at the output of -log_view, it is very strange to me, e.g., that
the stage labelled as "Graph"
does not scale properly as it is just a call to MatDuplicate if the
block size of the matrix is 1 (our case), and
I guess that it is just a local operation that does not require any
communication.
What I am missing here? The load does not seem to be unbalanced looking
at the "Ratio" column.
I wonder whether the observed behaviour is as expected, or this a
miss-configuration of the solver from our side.
I played (quite a lot) with several parameter-value combinations, and
the configuration above is the one that led to fastest
execution (from the ones tested, that might be incomplete, I can also
provide further feedback if helpful).
Any feedback that we can get from your experience in order to find the
cause(s) of this issue and a mitigating solution
will be of high added value.
Thanks very much in advance!
Best regards,
Alberto.
--
Alberto F. Martín-Huertas
Senior Researcher, PhD. Computational Science
Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)
Parc Mediterrani de la Tecnologia, UPC
Esteve Terradas 5, Building C3, Office 215,
08860 Castelldefels (Barcelona, Spain)
Tel.: (+34) 9341 34223
e-mail:amartin at cimne.upc.edu
FEMPAR project co-founder
web: http://www.fempar.org
________________
IMPORTANT NOTICE
All personal data contained on this mail will be processed confidentially and registered in a file property of CIMNE in
order to manage corporate communications. You may exercise the rights of access, rectification, erasure and object by
letter sent to Ed. C1 Campus Norte UPC. Gran Capitán s/n Barcelona.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181107/6ccab3ca/attachment-0001.html>
-------------- next part --------------
KSP Object: 4 MPI processes
type: cg
maximum iterations=500, initial guess is zero
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 0
Number smoothing steps 1
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 4 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 4 MPI processes
type: bjacobi
number of blocks = 4
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: cholesky
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqsbaij
rows=6, cols=6
package used to perform factorization: petsc
total: nonzeros=21, allocated nonzeros=21
total number of mallocs used during MatSetValues calls =0
block size is 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=6, cols=6
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 2 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=6, cols=6
total: nonzeros=36, allocated nonzeros=36
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 2 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 4 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.129951, max = 1.42946
eigenvalues estimate via cg min 0.51315, max 1.29951
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 4 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 4 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=201, cols=201
total: nonzeros=24313, allocated nonzeros=24313
total number of mallocs used during MatSetValues calls =0
using nonscalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 4 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.132036, max = 1.4524
eigenvalues estimate via cg min 0.0839922, max 1.32036
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_2_esteig_) 4 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 4 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=4621, cols=4621
total: nonzeros=369149, allocated nonzeros=369149
total number of mallocs used during MatSetValues calls =0
using nonscalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 4 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.167146, max = 1.83861
eigenvalues estimate via cg min 0.0634859, max 1.67146
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_3_esteig_) 4 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 4 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=63511, cols=63511
total: nonzeros=2301395, allocated nonzeros=38106600
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=63511, cols=63511
total: nonzeros=2301395, allocated nonzeros=38106600
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/scratch/upc26/upc26229/build_rel_fempar_cell_agg_ompi/FEMPAR/bin/par_test_h_adaptive_poisson_unfitted on a arch-linux2-c-opt named s14r2b46 with 4 processors, by upc26229 Wed Nov 7 01:07:35 2018
Using Petsc Release Version 3.9.0, Apr, 07, 2018
Max Max/Min Avg Total
Time (sec): 1.076e+02 1.00000 1.076e+02
Objects: 9.890e+02 1.00304 9.868e+02
Flop: 6.620e+08 1.09228 6.334e+08 2.533e+09
Flop/sec: 6.150e+06 1.09228 5.884e+06 2.353e+07
MPI Messages: 3.141e+03 1.04997 3.054e+03 1.222e+04
MPI Message Lengths: 1.331e+07 1.02147 4.319e+03 5.277e+07
MPI Reductions: 1.427e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.0765e+02 100.0% 2.5334e+09 100.0% 1.222e+04 100.0% 4.319e+03 100.0% 1.414e+03 99.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 9 1.0 1.0124e-03 2.9 0.00e+00 0.0 5.4e+01 8.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSidedF 75 1.0 1.8647e-0111.1 0.00e+00 0.0 3.2e+02 5.0e+04 0.0e+00 0 0 3 30 0 0 0 3 30 0 0
VecMDot 90 1.0 4.3525e-03 1.9 5.94e+06 1.1 0.0e+00 0.0e+00 9.0e+01 0 1 0 0 6 0 1 0 0 6 5180
VecTDot 237 1.0 1.6390e-02 3.5 3.88e+06 1.1 0.0e+00 0.0e+00 2.4e+02 0 1 0 0 17 0 1 0 0 17 897
VecNorm 225 1.0 7.4959e-03 2.6 3.28e+06 1.1 0.0e+00 0.0e+00 2.2e+02 0 0 0 0 16 0 0 0 0 16 1661
VecScale 99 1.0 2.6664e-04 1.2 5.94e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8457
VecCopy 105 1.0 5.5974e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 402 1.0 5.1273e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 237 1.0 1.1864e-03 1.1 3.88e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 12396
VecAYPX 678 1.0 3.3991e-03 1.2 6.00e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 6695
VecAXPBYCZ 288 1.0 2.0727e-03 1.1 8.64e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15825
VecMAXPY 99 1.0 1.9828e-03 1.3 7.02e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 13441
VecAssemblyBegin 24 1.0 3.5199e-03 1.1 0.00e+00 0.0 6.0e+01 3.6e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 24 1.0 1.2138e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 99 1.0 6.6968e-04 1.1 5.94e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3367
VecScatterBegin 810 1.0 5.6504e-03 1.0 0.00e+00 0.0 9.3e+03 2.7e+03 0.0e+00 0 0 76 47 0 0 0 76 47 0 0
VecScatterEnd 810 1.0 1.4432e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 9 1.0 1.5864e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 99 1.0 1.6803e-03 1.7 1.78e+06 1.1 0.0e+00 0.0e+00 9.9e+01 0 0 0 0 7 0 0 0 0 7 4026
MatMult 636 1.0 1.9035e-01 1.0 3.12e+08 1.1 7.6e+03 3.0e+03 0.0e+00 0 47 62 44 0 0 47 62 44 0 6275
MatMultAdd 72 1.0 1.0663e-02 1.1 6.23e+06 1.1 6.5e+02 5.0e+02 0.0e+00 0 1 5 1 0 0 1 5 1 0 2248
MatMultTranspose 72 1.0 1.3565e-02 1.3 6.23e+06 1.1 6.5e+02 5.0e+02 0.0e+00 0 1 5 1 0 0 1 5 1 0 1767
MatSolve 24 0.0 4.1796e-05 0.0 1.58e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 38
MatSOR 531 1.0 2.2511e-01 1.1 2.33e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 35 0 0 0 0 35 0 0 0 3950
MatCholFctrSym 3 1.0 4.2330e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCholFctrNum 3 1.0 4.0101e-05 1.8 1.80e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 9 1.0 1.6046e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 27 1.0 5.5906e-03 1.1 5.00e+06 1.1 1.1e+02 2.9e+03 0.0e+00 0 1 1 1 0 0 1 1 1 0 3428
MatResidual 72 1.0 2.0435e-02 1.1 3.38e+07 1.1 8.6e+02 2.9e+03 0.0e+00 0 5 7 5 0 0 5 7 5 0 6330
MatAssemblyBegin 168 1.0 2.3268e-01 1.4 0.00e+00 0.0 2.6e+02 6.1e+04 0.0e+00 0 0 2 30 0 0 0 2 30 0 0
MatAssemblyEnd 168 1.0 1.3291e-01 1.1 0.00e+00 0.0 7.8e+02 8.4e+02 3.6e+02 0 0 6 1 25 0 0 6 1 25 0
MatGetRow 162018 1.1 1.9420e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 0.0 4.2650e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 6 1.0 1.6496e-03 1.0 0.00e+00 0.0 4.8e+01 5.1e+01 9.6e+01 0 0 0 0 7 0 0 0 0 7 0
MatGetOrdering 3 0.0 2.7078e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 9 1.0 7.8692e-03 1.1 0.00e+00 0.0 7.0e+02 2.2e+03 2.7e+01 0 0 6 3 2 0 0 6 3 2 0
MatZeroEntries 9 1.0 1.0505e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 21 1.4 3.8399e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.5e+01 0 0 0 0 1 0 0 0 0 1 0
MatAXPY 9 1.0 1.6539e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 9 1.0 7.2559e-02 1.0 4.22e+06 1.1 6.3e+02 2.1e+03 1.1e+02 0 1 5 2 8 0 1 5 2 8 223
MatMatMultSym 9 1.0 5.8217e-02 1.0 0.00e+00 0.0 5.2e+02 1.9e+03 1.1e+02 0 0 4 2 8 0 0 4 2 8 0
MatMatMultNum 9 1.0 1.4338e-02 1.0 4.22e+06 1.1 1.1e+02 2.9e+03 0.0e+00 0 1 1 1 0 0 1 1 1 0 1128
MatPtAP 9 1.0 3.5600e-01 1.0 5.53e+07 1.1 9.5e+02 1.7e+04 1.4e+02 0 9 8 30 9 0 9 8 30 10 605
MatPtAPSymbolic 9 1.0 2.5158e-01 1.0 0.00e+00 0.0 6.2e+02 1.4e+04 6.3e+01 0 0 5 16 4 0 0 5 16 4 0
MatPtAPNumeric 9 1.0 1.0436e-01 1.0 5.53e+07 1.1 3.3e+02 2.2e+04 7.2e+01 0 9 3 14 5 0 9 3 14 5 2064
MatGetLocalMat 27 1.0 1.0135e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 27 1.0 7.4055e-03 1.2 0.00e+00 0.0 7.6e+02 9.9e+03 0.0e+00 0 0 6 14 0 0 0 6 14 0 0
KSPGMRESOrthog 90 1.0 5.7355e-03 1.4 1.19e+07 1.1 0.0e+00 0.0e+00 9.0e+01 0 2 0 0 6 0 2 0 0 6 7863
KSPSetUp 36 1.0 3.4705e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 2 0 0 0 0 2 0
KSPSolve 3 1.0 4.3135e-01 1.0 5.40e+08 1.1 7.9e+03 2.6e+03 3.6e+02 0 81 64 39 25 0 81 64 39 26 4786
PCGAMGGraph_AGG 9 1.0 2.2263e-01 1.0 4.22e+06 1.1 3.2e+02 1.9e+03 1.1e+02 0 1 3 1 8 0 1 3 1 8 73
PCGAMGCoarse_AGG 9 1.0 1.1966e-02 1.0 0.00e+00 0.0 7.0e+02 2.2e+03 2.7e+01 0 0 6 3 2 0 0 6 3 2 0
PCGAMGProl_AGG 9 1.0 4.8133e-02 1.0 0.00e+00 0.0 3.8e+02 1.7e+03 1.4e+02 0 0 3 1 10 0 0 3 1 10 0
PCGAMGPOpt_AGG 9 1.0 1.4798e-01 1.0 6.22e+07 1.1 1.7e+03 2.6e+03 3.7e+02 0 9 14 8 26 0 9 14 8 26 1605
GAMG: createProl 9 1.0 4.3246e-01 1.0 6.64e+07 1.1 3.1e+03 2.3e+03 6.5e+02 0 10 26 14 45 0 10 26 14 46 586
Graph 18 1.0 2.2127e-01 1.0 4.22e+06 1.1 3.2e+02 1.9e+03 1.1e+02 0 1 3 1 8 0 1 3 1 8 73
MIS/Agg 9 1.0 7.9907e-03 1.1 0.00e+00 0.0 7.0e+02 2.2e+03 2.7e+01 0 0 6 3 2 0 0 6 3 2 0
SA: col data 9 1.0 5.8624e-03 1.1 0.00e+00 0.0 2.2e+02 2.7e+03 3.6e+01 0 0 2 1 3 0 0 2 1 3 0
SA: frmProl0 9 1.0 4.0594e-02 1.0 0.00e+00 0.0 1.7e+02 4.9e+02 7.2e+01 0 0 1 0 5 0 0 1 0 5 0
SA: smooth 9 1.0 9.3119e-02 1.0 5.00e+06 1.1 6.3e+02 2.1e+03 1.3e+02 0 1 5 2 9 0 1 5 2 9 206
GAMG: partLevel 9 1.0 3.5860e-01 1.0 5.53e+07 1.1 1.0e+03 1.6e+04 2.9e+02 0 9 8 30 20 0 9 8 30 20 601
repartition 3 1.0 1.7322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 1 0 0 0 0 1 0
Invert-Sort 3 1.0 1.5528e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
Move A 3 1.0 1.1300e-03 1.0 0.00e+00 0.0 3.0e+01 6.6e+01 5.1e+01 0 0 0 0 4 0 0 0 0 4 0
Move P 3 1.0 7.7750e-04 1.0 0.00e+00 0.0 1.8e+01 2.6e+01 5.1e+01 0 0 0 0 4 0 0 0 0 4 0
PCSetUp 6 1.0 7.9483e-01 1.0 1.22e+08 1.1 4.1e+03 5.7e+03 9.9e+02 1 19 34 44 69 1 19 34 44 70 590
PCSetUpOnBlocks 24 1.0 5.2736e-04 1.1 1.80e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 24 1.0 4.0983e-01 1.0 5.07e+08 1.1 7.6e+03 2.5e+03 2.9e+02 0 76 62 36 20 0 76 62 36 20 4727
SFSetGraph 9 1.0 2.1600e-06 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 9 1.0 1.6055e-03 1.7 0.00e+00 0.0 1.6e+02 1.8e+03 0.0e+00 0 0 1 1 0 0 0 1 1 0 0
SFBcastBegin 45 1.0 5.1013e-04 1.1 0.00e+00 0.0 5.4e+02 2.4e+03 0.0e+00 0 0 4 2 0 0 0 4 2 0 0
SFBcastEnd 45 1.0 7.3911e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 429 429 16885896 0.
Matrix 252 252 628801440 0.
Matrix Coarsen 9 9 6156 0.
Index Set 147 147 299856 0.
Vec Scatter 57 57 79848 0.
Krylov Solver 36 36 314928 0.
Preconditioner 27 27 29544 0.
Viewer 5 4 3584 0.
PetscRandom 18 18 12492 0.
Star Forest Graph 9 9 8496 0.
========================================================================================================================
Average time to get PetscTime(): 4.1601e-08
Average time for MPI_Barrier(): 1.76301e-06
Average time for zero size MPI_Send(): 1.59626e-06
#PETSc Option Table entries:
--prefix run_a0b0c0d0e0f0g0h0i0_n4_l3
-aggrmeth alla_serial
-beta 10.0
-betaest .true.
-check .false.
-datadt data_distribution_fully_assembled
-dm 3
-dom -1.0
-in_space .true.
-ksp_converged_reason
-ksp_max_it 500
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.0e-6
-ksp_type cg
-ksp_view
-l 1
-levelset popcorn
-levelsettol 1.0e-6
-log_view
-lsdom 0.0
-maxl 6
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-mg_coarse_sub_pc_type cholesky
-mg_levels_esteig_ksp_type cg
-no_signal_handler
-nruns 3
-order 1
-pc_gamg_agg_nsmooths 1
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_type agg
-pc_type gamg
-petscrc /gpfs/scratch/upc26/upc26229/par_cell_aggr_poisson/paper/weak_scal_ompi/2nd-w-scal/petscrc-0
-tt 1
-uagg .true.
-wratio 10
-wsolution .false.
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/apps/INTEL/2017.4/mkl --with-debugging=0 --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
-----------------------------------------
Libraries compiled on 2018-06-04 18:55:32 on login1
Machine characteristics: Linux-4.4.103-92.56-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -wd1572 -g -O3
Using Fortran compiler: mpif90 -fPIC -g -O3
-----------------------------------------
Using include paths: -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/include -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -L/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/apps/INTEL/2017.4/mkl/lib/intel64 -L/apps/INTEL/2017.4/mkl/lib/intel64 -Wl,-rpath,/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -L/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -Wl,-rpath,/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -L/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
Linear solve converged due to CONVERGED_RTOL iterations 9
KSP Object: 32 MPI processes
type: cg
maximum iterations=500, initial guess is zero
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 32 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 0
Number smoothing steps 1
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 32 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 32 MPI processes
type: bjacobi
number of blocks = 32
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: cholesky
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqsbaij
rows=35, cols=35
package used to perform factorization: petsc
total: nonzeros=630, allocated nonzeros=630
total number of mallocs used during MatSetValues calls =0
block size is 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=35, cols=35
total: nonzeros=1225, allocated nonzeros=1225
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 7 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 32 MPI processes
type: mpiaij
rows=35, cols=35
total: nonzeros=1225, allocated nonzeros=1225
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 7 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 32 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.140301, max = 1.54331
eigenvalues estimate via cg min 0.150843, max 1.40301
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 32 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 32 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 32 MPI processes
type: mpiaij
rows=1654, cols=1654
total: nonzeros=302008, allocated nonzeros=302008
total number of mallocs used during MatSetValues calls =0
using nonscalable MatPtAP() implementation
using I-node (on process 0) routines: found 15 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 32 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.135428, max = 1.48971
eigenvalues estimate via cg min 0.0330649, max 1.35428
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_2_esteig_) 32 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 32 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 32 MPI processes
type: mpiaij
rows=38899, cols=38899
total: nonzeros=3088735, allocated nonzeros=3088735
total number of mallocs used during MatSetValues calls =0
using nonscalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 32 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.196606, max = 2.16267
eigenvalues estimate via cg min 0.0475838, max 1.96606
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_3_esteig_) 32 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 32 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 32 MPI processes
type: mpiaij
rows=508459, cols=508459
total: nonzeros=16204885, allocated nonzeros=305075400
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 32 MPI processes
type: mpiaij
rows=508459, cols=508459
total: nonzeros=16204885, allocated nonzeros=305075400
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/scratch/upc26/upc26229/build_rel_fempar_cell_agg_ompi/FEMPAR/bin/par_test_h_adaptive_poisson_unfitted on a arch-linux2-c-opt named s15r1b25 with 32 processors, by upc26229 Wed Nov 7 01:09:23 2018
Using Petsc Release Version 3.9.0, Apr, 07, 2018
Max Max/Min Avg Total
Time (sec): 1.621e+02 1.00000 1.621e+02
Objects: 9.890e+02 1.00304 9.861e+02
Flop: 7.802e+08 2.25680 6.170e+08 1.974e+10
Flop/sec: 4.812e+06 2.25680 3.806e+06 1.218e+08
MPI Messages: 2.457e+04 2.04836 1.917e+04 6.134e+05
MPI Message Lengths: 3.844e+07 2.16684 1.506e+03 9.236e+08
MPI Reductions: 1.469e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.6212e+02 100.0% 1.9744e+10 100.0% 6.134e+05 100.0% 1.506e+03 100.0% 1.456e+03 99.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 9 1.0 7.2859e-03 6.2 0.00e+00 0.0 2.5e+03 8.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSidedF 75 1.0 7.6260e+0063.2 0.00e+00 0.0 1.2e+04 2.1e+04 0.0e+00 2 0 2 27 0 2 0 2 27 0 0
VecMDot 90 1.0 3.3601e-02 7.0 7.73e+06 3.0 0.0e+00 0.0e+00 9.0e+01 0 1 0 0 6 0 1 0 0 6 5391
VecTDot 243 1.0 1.1469e-0117.2 5.28e+06 3.0 0.0e+00 0.0e+00 2.4e+02 0 1 0 0 17 0 1 0 0 17 1082
VecNorm 228 1.0 6.2426e-02 8.2 4.39e+06 3.0 0.0e+00 0.0e+00 2.3e+02 0 1 0 0 16 0 1 0 0 16 1650
VecScale 99 1.0 1.0887e-03 6.2 7.73e+05 3.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 16641
VecCopy 114 1.0 1.3520e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 438 1.0 1.0085e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 243 1.0 6.7886e-03 3.4 5.28e+06 3.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 18279
VecAYPX 753 1.0 9.3539e-03 3.1 8.63e+06 3.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 21626
VecAXPBYCZ 324 1.0 5.0166e-03 2.0 1.26e+07 3.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 59097
VecMAXPY 99 1.0 4.4716e-03 4.9 9.14e+06 3.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 47884
VecAssemblyBegin 24 1.0 1.1001e-02 2.2 0.00e+00 0.0 1.7e+03 2.0e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 24 1.0 3.4029e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 99 1.0 1.1331e-03 2.3 7.73e+05 3.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 15989
VecScatterBegin 885 1.0 3.6779e-02 2.2 0.00e+00 0.0 4.6e+05 9.8e+02 0.0e+00 0 0 75 49 0 0 0 75 49 0 0
VecScatterEnd 885 1.0 3.2675e-0112.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 9 1.0 2.0615e-03 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 99 1.0 8.0360e-03 3.0 2.32e+06 3.0 0.0e+00 0.0e+00 9.9e+01 0 0 0 0 7 0 0 0 0 7 6764
MatMult 693 1.0 4.7642e-01 1.3 3.73e+08 2.2 3.9e+05 1.1e+03 0.0e+00 0 48 64 46 0 0 48 64 46 0 19814
MatMultAdd 81 1.0 2.5011e-02 1.8 9.13e+06 2.9 2.7e+04 2.4e+02 0.0e+00 0 1 4 1 0 0 1 4 1 0 8668
MatMultTranspose 81 1.0 4.0727e-02 2.3 9.13e+06 2.9 2.7e+04 2.4e+02 0.0e+00 0 1 4 1 0 0 1 4 1 0 5323
MatSolve 27 0.0 2.1556e-04 0.0 6.52e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 302
MatSOR 585 1.0 4.9140e-01 1.9 2.65e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 13575
MatCholFctrSym 3 1.0 6.5050e-03313.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCholFctrNum 3 1.0 3.2232e-03996.2 1.05e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 9 1.0 2.4005e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 27 1.0 1.1253e-02 2.1 5.57e+06 2.3 5.1e+03 1.0e+03 0.0e+00 0 1 1 1 0 0 1 1 1 0 12589
MatResidual 81 1.0 7.8485e-02 2.2 4.17e+07 2.2 4.6e+04 1.0e+03 0.0e+00 0 5 7 5 0 0 5 7 5 0 13482
MatAssemblyBegin 168 1.0 7.7038e+0036.5 0.00e+00 0.0 9.9e+03 2.5e+04 0.0e+00 2 0 2 26 0 2 0 2 26 0 0
MatAssemblyEnd 168 1.0 2.8208e-01 1.5 0.00e+00 0.0 3.7e+04 3.1e+02 3.6e+02 0 0 6 1 25 0 0 6 1 25 0
MatGetRow 210816 3.0 2.5672e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 0.0 2.7154e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 6 1.0 1.1316e-02 1.0 0.00e+00 0.0 5.9e+02 1.4e+02 9.6e+01 0 0 0 0 7 0 0 0 0 7 0
MatGetOrdering 3 0.0 7.1425e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 9 1.0 2.3828e-02 1.2 0.00e+00 0.0 5.4e+04 7.5e+02 6.0e+01 0 0 9 4 4 0 0 9 4 4 0
MatZeroEntries 9 1.0 1.7683e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 21 1.4 5.7901e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.5e+01 0 0 0 0 1 0 0 0 0 1 0
MatAXPY 9 1.0 2.3933e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 9 1.0 1.0349e-01 1.0 4.64e+06 2.2 3.0e+04 7.4e+02 1.1e+02 0 1 5 2 7 0 1 5 2 7 1136
MatMatMultSym 9 1.0 8.4956e-02 1.0 0.00e+00 0.0 2.4e+04 6.8e+02 1.1e+02 0 0 4 2 7 0 0 4 2 7 0
MatMatMultNum 9 1.0 1.9249e-02 1.1 4.64e+06 2.2 5.1e+03 1.0e+03 0.0e+00 0 1 1 1 0 0 1 1 1 0 6108
MatPtAP 9 1.0 5.3590e-01 1.0 6.57e+07 2.3 4.7e+04 5.9e+03 1.4e+02 0 8 8 30 9 0 8 8 30 9 3097
MatPtAPSymbolic 9 1.0 3.5159e-01 1.0 0.00e+00 0.0 2.9e+04 5.2e+03 6.3e+01 0 0 5 16 4 0 0 5 16 4 0
MatPtAPNumeric 9 1.0 1.8422e-01 1.0 6.57e+07 2.3 1.8e+04 6.9e+03 7.2e+01 0 8 3 14 5 0 8 3 14 5 9008
MatGetLocalMat 27 1.0 1.3379e-02 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 27 1.0 2.5772e-02 1.3 0.00e+00 0.0 3.6e+04 3.6e+03 0.0e+00 0 0 6 14 0 0 0 6 14 0 0
KSPGMRESOrthog 90 1.0 3.4506e-02 4.2 1.55e+07 3.0 0.0e+00 0.0e+00 9.0e+01 0 2 0 0 6 0 2 0 0 6 10501
KSPSetUp 36 1.0 7.0966e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 2 0 0 0 0 2 0
KSPSolve 3 1.0 9.5439e-01 1.0 6.41e+08 2.2 3.9e+05 9.8e+02 3.7e+02 1 82 64 42 25 1 82 64 42 26 16969
PCGAMGGraph_AGG 9 1.0 2.9486e-01 1.0 4.64e+06 2.2 1.5e+04 6.9e+02 1.1e+02 0 1 2 1 7 0 1 2 1 7 399
PCGAMGCoarse_AGG 9 1.0 2.6490e-02 1.1 0.00e+00 0.0 5.4e+04 7.5e+02 6.0e+01 0 0 9 4 4 0 0 9 4 4 0
PCGAMGProl_AGG 9 1.0 8.0748e-02 1.0 0.00e+00 0.0 1.5e+04 8.1e+02 1.4e+02 0 0 2 1 10 0 0 2 1 10 0
PCGAMGPOpt_AGG 9 1.0 2.4086e-01 1.0 6.97e+07 2.3 8.1e+04 9.3e+02 3.7e+02 0 9 13 8 25 0 9 13 8 25 7357
GAMG: createProl 9 1.0 6.4623e-01 1.0 7.44e+07 2.3 1.7e+05 8.4e+02 6.8e+02 0 10 27 15 46 0 10 27 15 47 2924
Graph 18 1.0 2.8964e-01 1.0 4.64e+06 2.2 1.5e+04 6.9e+02 1.1e+02 0 1 2 1 7 0 1 2 1 7 406
MIS/Agg 9 1.0 2.3962e-02 1.2 0.00e+00 0.0 5.4e+04 7.5e+02 6.0e+01 0 0 9 4 4 0 0 9 4 4 0
SA: col data 9 1.0 9.0052e-03 1.1 0.00e+00 0.0 1.0e+04 1.0e+03 3.6e+01 0 0 2 1 2 0 0 2 1 2 0
SA: frmProl0 9 1.0 6.9267e-02 1.0 0.00e+00 0.0 4.7e+03 3.3e+02 7.2e+01 0 0 1 0 5 0 0 1 0 5 0
SA: smooth 9 1.0 1.3371e-01 1.0 5.57e+06 2.3 3.0e+04 7.4e+02 1.3e+02 0 1 5 2 9 0 1 5 2 9 1059
GAMG: partLevel 9 1.0 5.5452e-01 1.0 6.57e+07 2.3 4.8e+04 5.8e+03 2.9e+02 0 8 8 30 20 0 8 8 30 20 2993
repartition 3 1.0 1.7596e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 1 0 0 0 0 1 0
Invert-Sort 3 1.0 6.5844e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
Move A 3 1.0 6.4088e-03 1.1 0.00e+00 0.0 4.0e+02 1.6e+02 5.1e+01 0 0 0 0 3 0 0 0 0 4 0
Move P 3 1.0 5.6291e-03 1.1 0.00e+00 0.0 1.9e+02 9.1e+01 5.1e+01 0 0 0 0 3 0 0 0 0 4 0
PCSetUp 6 1.0 1.2264e+00 1.0 1.40e+08 2.3 2.1e+05 2.0e+03 1.0e+03 1 18 35 45 70 1 18 35 45 70 2894
PCSetUpOnBlocks 27 1.0 7.7373e-03 1.6 1.05e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 27 1.0 9.0805e-01 1.1 6.00e+08 2.2 3.8e+05 9.3e+02 2.9e+02 1 77 62 38 20 1 77 62 38 20 16705
SFSetGraph 9 1.0 2.6450e-06 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 9 1.0 8.5383e-03 2.5 0.00e+00 0.0 7.6e+03 6.9e+02 0.0e+00 0 0 1 1 0 0 0 1 1 0 0
SFBcastBegin 78 1.0 4.1889e-03 2.7 0.00e+00 0.0 4.7e+04 7.6e+02 0.0e+00 0 0 8 4 0 0 0 8 4 0 0
SFBcastEnd 78 1.0 5.7714e-03 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 429 429 10070544 0.
Matrix 252 252 362421120 0.
Matrix Coarsen 9 9 6156 0.
Index Set 147 147 365304 0.
Vec Scatter 57 57 79800 0.
Krylov Solver 36 36 314928 0.
Preconditioner 27 27 29544 0.
Viewer 5 4 3584 0.
PetscRandom 18 18 12492 0.
Star Forest Graph 9 9 8496 0.
========================================================================================================================
Average time to get PetscTime(): 4.1537e-08
Average time for MPI_Barrier(): 3.98215e-06
Average time for zero size MPI_Send(): 1.49463e-06
#PETSc Option Table entries:
--prefix run_a0b0c0d0e0f0g0h0i0_n5_l3
-aggrmeth alla_serial
-beta 10.0
-betaest .true.
-check .false.
-datadt data_distribution_fully_assembled
-dm 3
-dom -1.0
-in_space .true.
-ksp_converged_reason
-ksp_max_it 500
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.0e-6
-ksp_type cg
-ksp_view
-l 1
-levelset popcorn
-levelsettol 1.0e-6
-log_view
-lsdom 0.0
-maxl 7
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-mg_coarse_sub_pc_type cholesky
-mg_levels_esteig_ksp_type cg
-no_signal_handler
-nruns 3
-order 1
-pc_gamg_agg_nsmooths 1
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_type agg
-pc_type gamg
-petscrc /gpfs/scratch/upc26/upc26229/par_cell_aggr_poisson/paper/weak_scal_ompi/2nd-w-scal/petscrc-0
-tt 1
-uagg .true.
-wratio 10
-wsolution .false.
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/apps/INTEL/2017.4/mkl --with-debugging=0 --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
-----------------------------------------
Libraries compiled on 2018-06-04 18:55:32 on login1
Machine characteristics: Linux-4.4.103-92.56-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -wd1572 -g -O3
Using Fortran compiler: mpif90 -fPIC -g -O3
-----------------------------------------
Using include paths: -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/include -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -L/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/apps/INTEL/2017.4/mkl/lib/intel64 -L/apps/INTEL/2017.4/mkl/lib/intel64 -Wl,-rpath,/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -L/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -Wl,-rpath,/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -L/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------
Ending run at Wed Nov 7 01:09:23 CET 2018
Ending script at Wed Nov 7 01:09:23 CET 2018
-------------- next part --------------
KSP Object: 262 MPI processes
type: cg
maximum iterations=500, initial guess is zero
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 262 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0. 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 0
Number smoothing steps 1
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 262 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 262 MPI processes
type: bjacobi
number of blocks = 262
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: cholesky
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqsbaij
rows=4, cols=4
package used to perform factorization: petsc
total: nonzeros=10, allocated nonzeros=10
total number of mallocs used during MatSetValues calls =0
block size is 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=4, cols=4
total: nonzeros=16, allocated nonzeros=16
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 1 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 262 MPI processes
type: mpiaij
rows=4, cols=4
total: nonzeros=16, allocated nonzeros=16
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 1 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 262 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.129006, max = 1.41907
eigenvalues estimate via cg min 0.482341, max 1.29006
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 262 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 262 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 262 MPI processes
type: mpiaij
rows=284, cols=284
total: nonzeros=47942, allocated nonzeros=47942
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 262 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.160435, max = 1.76479
eigenvalues estimate via cg min 0.0880722, max 1.60435
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_2_esteig_) 262 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 262 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 262 MPI processes
type: mpiaij
rows=13842, cols=13842
total: nonzeros=2801068, allocated nonzeros=2801068
total number of mallocs used during MatSetValues calls =0
using nonscalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 262 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.135811, max = 1.49392
eigenvalues estimate via cg min 0.036202, max 1.35811
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_3_esteig_) 262 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 262 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 262 MPI processes
type: mpiaij
rows=319856, cols=319856
total: nonzeros=25116236, allocated nonzeros=25116236
total number of mallocs used during MatSetValues calls =0
using scalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 262 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.298538, max = 3.28392
eigenvalues estimate via cg min 0.0506704, max 2.98538
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_4_esteig_) 262 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 262 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 262 MPI processes
type: mpiaij
rows=4068981, cols=4068981
total: nonzeros=120055495, allocated nonzeros=2441388600
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 262 MPI processes
type: mpiaij
rows=4068981, cols=4068981
total: nonzeros=120055495, allocated nonzeros=2441388600
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/scratch/upc26/upc26229/build_rel_fempar_cell_agg_ompi/FEMPAR/bin/par_test_h_adaptive_poisson_unfitted on a arch-linux2-c-opt named s07r1b55 with 262 processors, by upc26229 Wed Nov 7 01:14:20 2018
Using Petsc Release Version 3.9.0, Apr, 07, 2018
Max Max/Min Avg Total
Time (sec): 2.359e+02 1.00000 2.359e+02
Objects: 1.355e+03 1.00222 1.352e+03
Flop: 8.832e+08 0.00000 6.569e+08 1.721e+11
Flop/sec: 3.743e+06 0.00000 2.784e+06 7.295e+08
MPI Messages: 5.577e+04 5069.63636 3.334e+04 8.736e+06
MPI Message Lengths: 4.973e+07 1130198.90909 1.019e+03 8.904e+09
MPI Reductions: 2.072e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.3593e+02 100.0% 1.7210e+11 100.0% 8.736e+06 100.0% 1.019e+03 100.0% 2.059e+03 99.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 12 1.0 1.6995e-02 7.1 0.00e+00 0.0 3.0e+04 8.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSidedF 102 1.0 1.0505e+0133.9 0.00e+00 0.0 1.3e+05 1.6e+04 0.0e+00 3 0 1 24 0 3 0 1 24 0 0
VecMDot 120 1.0 1.0321e-01 9.5 7.43e+06 0.0 0.0e+00 0.0e+00 1.2e+02 0 1 0 0 6 0 1 0 0 6 14077
VecTDot 318 1.0 1.3620e+0061.0 5.57e+06 0.0 0.0e+00 0.0e+00 3.2e+02 0 1 0 0 15 0 1 0 0 15 802
VecNorm 300 1.0 2.2448e-0110.2 4.46e+06 0.0 0.0e+00 0.0e+00 3.0e+02 0 1 0 0 14 0 1 0 0 15 3894
VecScale 132 1.0 4.2758e-0329.9 7.43e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 33981
VecCopy 174 1.0 1.9950e-0355.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 651 1.0 1.4187e-0314.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 318 1.0 2.0724e-02148.6 5.57e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 52686
VecAYPX 1194 1.0 1.5941e-02106.7 9.89e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 121379
VecAXPBYCZ 528 1.0 9.4921e-03156.6 1.49e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 306144
VecMAXPY 132 1.0 7.9654e-0356.3 8.78e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 215576
VecAssemblyBegin 33 1.0 1.9882e-02 3.0 0.00e+00 0.0 1.7e+04 1.7e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 33 1.0 4.4571e-03149.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 132 1.0 1.8402e-0362.2 7.43e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 78959
VecScatterBegin 1371 1.0 6.3832e-02215.6 0.00e+00 0.0 6.4e+06 7.3e+02 0.0e+00 0 0 73 52 0 0 0 73 52 0 0
VecScatterEnd 1371 1.0 1.2430e+006808.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 12 1.0 2.0955e-03958.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 132 1.0 2.2506e-02 4.3 2.23e+06 0.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 6 0 0 0 0 6 19368
MatMult 1065 1.0 1.1689e+001737.8 4.26e+08 0.0 5.4e+06 8.1e+02 0.0e+00 0 48 62 49 0 0 48 62 49 0 71034
MatMultAdd 132 1.0 8.6626e-021090.7 1.08e+07 0.0 3.7e+05 1.9e+02 0.0e+00 0 1 4 1 0 0 1 4 1 0 24436
MatMultTranspose 132 1.0 5.6369e-013353.5 1.08e+07 0.0 3.7e+05 1.9e+02 0.0e+00 0 1 4 1 0 0 1 4 1 0 3755
MatSolve 33 0.0 5.7641e-05 0.0 9.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 16
MatSOR 924 1.0 7.8933e-015774.2 3.08e+08 0.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 74391
MatCholFctrSym 3 1.0 1.0960e-02545.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCholFctrNum 3 1.0 6.4946e-032309.9 1.20e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 12 1.0 3.4631e-0230.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 36 1.0 2.6028e-02184.6 5.48e+06 0.0 6.1e+04 7.7e+02 0.0e+00 0 1 1 1 0 0 1 1 1 0 41516
MatResidual 132 1.0 1.8136e-011415.7 5.00e+07 0.0 6.7e+05 7.7e+02 0.0e+00 0 6 8 6 0 0 6 8 6 0 53866
MatAssemblyBegin 231 1.0 1.0487e+0118.5 0.00e+00 0.0 1.1e+05 1.9e+04 0.0e+00 3 0 1 23 0 3 0 1 23 0 0
MatAssemblyEnd 231 1.0 4.6673e-01 1.6 0.00e+00 0.0 5.3e+05 2.0e+02 5.0e+02 0 0 6 1 24 0 0 6 1 24 0
MatGetRow 202635 0.0 2.5265e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 0.0 1.2548e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 12 1.0 5.2781e-02 1.0 0.00e+00 0.0 5.4e+03 4.8e+02 1.9e+02 0 0 0 0 9 0 0 0 0 9 0
MatGetOrdering 3 0.0 1.7776e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 12 1.0 4.3461e-02 1.3 0.00e+00 0.0 1.1e+06 4.1e+02 1.2e+02 0 0 13 5 6 0 0 13 5 6 0
MatZeroEntries 12 1.0 1.7174e-03325.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 24 1.3 6.3856e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 1 0 0 0 0 1 0
MatAXPY 12 1.0 2.7936e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 12 1.0 3.3930e-01 1.1 4.54e+06 0.0 3.5e+05 5.5e+02 1.5e+02 0 1 4 2 7 0 1 4 2 7 2618
MatMatMultSym 12 1.0 2.5782e-01 1.0 0.00e+00 0.0 2.9e+05 5.1e+02 1.4e+02 0 0 3 2 7 0 0 3 2 7 0
MatMatMultNum 12 1.0 5.1971e-02 1.1 4.54e+06 0.0 6.1e+04 7.7e+02 0.0e+00 0 1 1 1 0 0 1 1 1 0 17089
MatPtAP 12 1.0 1.0058e+00 1.0 6.57e+07 0.0 6.6e+05 3.9e+03 1.8e+02 0 7 8 29 9 0 7 8 29 9 12715
MatPtAPSymbolic 12 1.0 5.2416e-01 1.0 0.00e+00 0.0 3.5e+05 4.0e+03 8.4e+01 0 0 4 16 4 0 0 4 16 4 0
MatPtAPNumeric 12 1.0 4.7255e-01 1.0 6.57e+07 0.0 3.2e+05 3.8e+03 9.6e+01 0 7 4 13 5 0 7 4 13 5 27062
MatGetLocalMat 36 1.0 1.3884e-0237.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 36 1.0 5.1478e-0293.5 0.00e+00 0.0 4.3e+05 2.7e+03 0.0e+00 0 0 5 13 0 0 0 5 13 0 0
KSPGMRESOrthog 120 1.0 1.0809e-01 7.9 1.49e+07 0.0 0.0e+00 0.0e+00 1.2e+02 0 2 0 0 6 0 2 0 0 6 26883
KSPSetUp 45 1.0 1.8355e-02 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 3 1.0 1.5817e+00 1.0 7.46e+08 0.0 5.6e+06 7.3e+02 4.9e+02 1 84 64 45 23 1 84 64 45 24 91555
PCGAMGGraph_AGG 12 1.0 3.4556e-01 1.0 4.54e+06 0.0 1.8e+05 5.2e+02 1.4e+02 0 1 2 1 7 0 1 2 1 7 2570
PCGAMGCoarse_AGG 12 1.0 4.4724e-02 1.2 0.00e+00 0.0 1.1e+06 4.1e+02 1.2e+02 0 0 13 5 6 0 0 13 5 6 0
PCGAMGProl_AGG 12 1.0 3.5267e-01 1.0 0.00e+00 0.0 1.7e+05 6.4e+02 1.9e+02 0 0 2 1 9 0 0 2 1 9 0
PCGAMGPOpt_AGG 12 1.0 5.1901e-01 1.0 6.89e+07 0.0 9.6e+05 6.9e+02 5.0e+02 0 8 11 7 24 0 8 11 7 24 26218
GAMG: createProl 12 1.0 1.2634e+00 1.0 7.34e+07 0.0 2.4e+06 5.4e+02 9.5e+02 1 8 28 15 46 1 8 28 15 46 11473
Graph 24 1.0 3.3748e-01 1.0 4.54e+06 0.0 1.8e+05 5.2e+02 1.4e+02 0 1 2 1 7 0 1 2 1 7 2632
MIS/Agg 12 1.0 4.3569e-02 1.3 0.00e+00 0.0 1.1e+06 4.1e+02 1.2e+02 0 0 13 5 6 0 0 13 5 6 0
SA: col data 12 1.0 1.2331e-02 1.1 0.00e+00 0.0 1.2e+05 7.7e+02 4.8e+01 0 0 1 1 2 0 0 1 1 2 0
SA: frmProl0 12 1.0 3.3684e-01 1.0 0.00e+00 0.0 4.7e+04 2.9e+02 9.6e+01 0 0 1 0 5 0 0 1 0 5 0
SA: smooth 12 1.0 3.7167e-01 1.1 5.48e+06 0.0 3.5e+05 5.5e+02 1.7e+02 0 1 4 2 8 0 1 4 2 8 2907
GAMG: partLevel 12 1.0 1.0910e+00 1.0 6.57e+07 0.0 6.7e+05 3.9e+03 4.9e+02 0 7 8 29 24 0 7 8 29 24 11721
repartition 6 1.0 8.1465e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01 0 0 0 0 2 0 0 0 0 2 0
Invert-Sort 6 1.0 3.8278e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0
Move A 6 1.0 4.3095e-02 1.0 0.00e+00 0.0 2.9e+03 8.1e+02 1.0e+02 0 0 0 0 5 0 0 0 0 5 0
Move P 6 1.0 1.6401e-02 1.1 0.00e+00 0.0 2.5e+03 9.0e+01 1.0e+02 0 0 0 0 5 0 0 0 0 5 0
PCSetUp 6 1.0 2.4101e+00 1.0 1.38e+08 0.0 3.1e+06 1.3e+03 1.5e+03 1 16 36 44 73 1 16 36 44 73 11321
PCSetUpOnBlocks 33 1.0 1.9296e-02 5.9 1.20e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 33 1.0 1.4915e+00 5.2 6.98e+08 0.0 5.4e+06 6.9e+02 3.8e+02 1 79 62 42 19 1 79 62 42 19 90795
SFSetGraph 12 1.0 3.8510e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 1.7290e-02 3.4 0.00e+00 0.0 9.1e+04 5.2e+02 0.0e+00 0 0 1 1 0 0 0 1 1 0 0
SFBcastBegin 144 1.0 8.3527e-0354.1 0.00e+00 0.0 1.0e+06 4.0e+02 0.0e+00 0 0 12 5 0 0 0 12 5 0 0
SFBcastEnd 144 1.0 1.6454e-02500.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 573 573 1139304 0.
Matrix 348 348 6730200 0.
Matrix Coarsen 12 12 8208 0.
Index Set 222 222 228240 0.
Vec Scatter 81 81 113448 0.
Krylov Solver 45 45 415944 0.
Preconditioner 33 33 35448 0.
Viewer 5 4 3584 0.
PetscRandom 24 24 16656 0.
Star Forest Graph 12 12 11328 0.
========================================================================================================================
Average time to get PetscTime(): 4.40516e-08
Average time for MPI_Barrier(): 1.38046e-05
Average time for zero size MPI_Send(): 1.46144e-06
#PETSc Option Table entries:
--prefix run_a0b0c0d0e0f0g0h0i0_n6_l3
-aggrmeth alla_serial
-beta 10.0
-betaest .true.
-check .false.
-datadt data_distribution_fully_assembled
-dm 3
-dom -1.0
-in_space .true.
-ksp_converged_reason
-ksp_max_it 500
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.0e-6
-ksp_type cg
-ksp_view
-l 1
-levelset popcorn
-levelsettol 1.0e-6
-log_view
-lsdom 0.0
-maxl 8
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-mg_coarse_sub_pc_type cholesky
-mg_levels_esteig_ksp_type cg
-no_signal_handler
-nruns 3
-order 1
-pc_gamg_agg_nsmooths 1
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_type agg
-pc_type gamg
-petscrc /gpfs/scratch/upc26/upc26229/par_cell_aggr_poisson/paper/weak_scal_ompi/2nd-w-scal/petscrc-0
-tt 1
-uagg .true.
-wratio 10
-wsolution .false.
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/apps/INTEL/2017.4/mkl --with-debugging=0 --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
-----------------------------------------
Libraries compiled on 2018-06-04 18:55:32 on login1
Machine characteristics: Linux-4.4.103-92.56-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -wd1572 -g -O3
Using Fortran compiler: mpif90 -fPIC -g -O3
-----------------------------------------
Using include paths: -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/include -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -L/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/apps/INTEL/2017.4/mkl/lib/intel64 -L/apps/INTEL/2017.4/mkl/lib/intel64 -Wl,-rpath,/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -L/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -Wl,-rpath,/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -L/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------
Ending run at Wed Nov 7 01:14:21 CET 2018
Ending script at Wed Nov 7 01:14:21 CET 2018
-------------- next part --------------
KSP Object: 2097 MPI processes
type: cg
maximum iterations=500, initial guess is zero
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 2097 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0. 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 0
Number smoothing steps 1
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 2097 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 2097 MPI processes
type: bjacobi
number of blocks = 2097
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: cholesky
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqsbaij
rows=36, cols=36
package used to perform factorization: petsc
total: nonzeros=666, allocated nonzeros=666
total number of mallocs used during MatSetValues calls =0
block size is 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=36, cols=36
total: nonzeros=1296, allocated nonzeros=1296
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 8 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 2097 MPI processes
type: mpiaij
rows=36, cols=36
total: nonzeros=1296, allocated nonzeros=1296
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 8 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 2097 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.167617, max = 1.84379
eigenvalues estimate via cg min 0.106378, max 1.67617
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 2097 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 2097 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2097 MPI processes
type: mpiaij
rows=2304, cols=2304
total: nonzeros=598220, allocated nonzeros=598220
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 2097 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.14326, max = 1.57586
eigenvalues estimate via cg min 0.0397147, max 1.4326
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_2_esteig_) 2097 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 2097 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2097 MPI processes
type: mpiaij
rows=112580, cols=112580
total: nonzeros=23420598, allocated nonzeros=23420598
total number of mallocs used during MatSetValues calls =0
using scalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 2097 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.136096, max = 1.49705
eigenvalues estimate via cg min 0.0338371, max 1.36096
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_3_esteig_) 2097 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 2097 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2097 MPI processes
type: mpiaij
rows=2597459, cols=2597459
total: nonzeros=202116477, allocated nonzeros=202116477
total number of mallocs used during MatSetValues calls =0
using scalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 2097 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.335857, max = 3.69443
eigenvalues estimate via cg min 0.0542715, max 3.35857
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_4_esteig_) 2097 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 2097 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 2097 MPI processes
type: mpiaij
rows=32552439, cols=32552439
total: nonzeros=920267663, allocated nonzeros=19531463400
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 2097 MPI processes
type: mpiaij
rows=32552439, cols=32552439
total: nonzeros=920267663, allocated nonzeros=19531463400
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/scratch/upc26/upc26229/build_rel_fempar_cell_agg_ompi/FEMPAR/bin/par_test_h_adaptive_poisson_unfitted on a arch-linux2-c-opt named s07r2b02 with 2097 processors, by upc26229 Wed Nov 7 01:15:12 2018
Using Petsc Release Version 3.9.0, Apr, 07, 2018
Max Max/Min Avg Total
Time (sec): 2.458e+02 1.00000 2.458e+02
Objects: 1.355e+03 1.00222 1.352e+03
Flop: 9.818e+08 0.00000 6.789e+08 1.424e+12
Flop/sec: 3.994e+06 0.00000 2.762e+06 5.791e+09
MPI Messages: 1.103e+05 10027.81818 4.452e+04 9.336e+07
MPI Message Lengths: 5.692e+07 1293601.40909 8.203e+02 7.658e+10
MPI Reductions: 2.216e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.4583e+02 100.0% 1.4237e+12 100.0% 9.336e+07 100.0% 8.203e+02 100.0% 2.203e+03 99.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 12 1.0 4.6126e-02 2.1 0.00e+00 0.0 2.9e+05 8.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSidedF 102 1.0 1.1057e+0119.1 0.00e+00 0.0 1.1e+06 1.5e+04 0.0e+00 3 0 1 22 0 3 0 1 22 0 0
VecMDot 120 1.0 1.3730e-01 4.0 7.37e+06 0.0 0.0e+00 0.0e+00 1.2e+02 0 1 0 0 5 0 1 0 0 5 84752
VecTDot 324 1.0 1.8522e+0019.8 5.77e+06 0.0 0.0e+00 0.0e+00 3.2e+02 0 1 0 0 15 0 1 0 0 15 4929
VecNorm 303 1.0 3.3777e-01 3.6 4.55e+06 0.0 0.0e+00 0.0e+00 3.0e+02 0 1 0 0 14 0 1 0 0 14 21298
VecScale 132 1.0 6.1834e-0351.4 7.37e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 188203
VecCopy 186 1.0 2.5507e-03101.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 696 1.0 4.9682e-0364.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 324 1.0 1.8679e-02143.7 5.77e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 488834
VecAYPX 1293 1.0 1.8702e-02166.9 1.06e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 895518
VecAXPBYCZ 576 1.0 1.2857e-02233.9 1.61e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1974899
VecMAXPY 132 1.0 8.0905e-03119.3 8.71e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1699935
VecAssemblyBegin 33 1.0 2.7789e-02 2.2 0.00e+00 0.0 1.3e+05 1.8e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 33 1.0 5.7797e-03196.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 132 1.0 3.0804e-03165.9 7.37e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 377790
VecScatterBegin 1470 1.0 9.9138e-02609.8 0.00e+00 0.0 6.5e+07 6.3e+02 0.0e+00 0 0 69 53 0 0 0 69 53 0 0
VecScatterEnd 1470 1.0 1.6594e+0012314.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 12 1.0 2.2327e-031311.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 132 1.0 5.1942e-02 1.7 2.21e+06 0.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 6 0 0 0 0 6 67213
MatMult 1140 1.0 1.5379e+002703.7 4.83e+08 0.0 5.4e+07 7.0e+02 0.0e+00 0 48 58 50 0 0 48 58 50 0 447478
MatMultAdd 144 1.0 3.0789e-014466.1 1.18e+07 0.0 4.2e+06 1.5e+02 0.0e+00 0 1 5 1 0 0 1 5 1 0 59866
MatMultTranspose 144 1.0 7.5095e-015720.6 1.18e+07 0.0 4.2e+06 1.5e+02 0.0e+00 0 1 5 1 0 0 1 5 1 0 24545
MatSolve 36 0.0 1.6145e-04 0.0 9.20e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 570
MatSOR 996 1.0 9.8081e-018037.0 3.40e+08 0.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 497713
MatCholFctrSym 3 1.0 1.0764e-02556.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCholFctrNum 3 1.0 7.9658e-032816.4 1.08e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 12 1.0 3.8779e-0251.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 36 1.0 3.0256e-02564.4 5.74e+06 0.0 5.7e+05 6.7e+02 0.0e+00 0 1 1 1 0 0 1 1 1 0 278106
MatResidual 144 1.0 2.4749e-012126.4 5.77e+07 0.0 6.9e+06 6.7e+02 0.0e+00 0 6 7 6 0 0 6 7 6 0 333518
MatAssemblyBegin 231 1.0 1.1035e+0115.1 0.00e+00 0.0 9.7e+05 1.7e+04 0.0e+00 3 0 1 22 0 3 0 1 22 0 0
MatAssemblyEnd 231 1.0 5.6362e-01 1.5 0.00e+00 0.0 5.5e+06 1.6e+02 5.0e+02 0 0 6 1 23 0 0 6 1 23 0
MatGetRow 201060 0.0 2.9343e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 0.0 4.4128e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 12 1.0 1.0294e-01 1.1 0.00e+00 0.0 2.6e+05 1.3e+02 1.9e+02 0 0 0 0 9 0 0 0 0 9 0
MatGetOrdering 3 0.0 1.7719e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 12 1.0 1.3539e-01 1.2 0.00e+00 0.0 1.7e+07 3.0e+02 2.5e+02 0 0 18 7 11 0 0 18 7 11 0
MatZeroEntries 12 1.0 1.8496e-03427.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 24 1.3 6.3806e-02 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 1 0 0 0 0 1 0
MatAXPY 12 1.0 3.7202e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 12 1.0 7.9738e-01 1.5 4.80e+06 0.0 3.3e+06 4.9e+02 1.5e+02 0 0 3 2 7 0 0 3 2 7 8626
MatMatMultSym 12 1.0 4.7848e-01 1.0 0.00e+00 0.0 2.7e+06 4.5e+02 1.4e+02 0 0 3 2 6 0 0 3 2 7 0
MatMatMultNum 12 1.0 7.1849e-02 1.1 4.80e+06 0.0 5.7e+05 6.7e+02 0.0e+00 0 0 1 1 0 0 0 1 1 0 95734
MatPtAP 12 1.0 1.3269e+00 1.0 6.99e+07 0.0 6.5e+06 3.3e+03 1.9e+02 1 7 7 28 8 1 7 7 28 8 75292
MatPtAPSymbolic 12 1.0 7.0335e-01 1.0 0.00e+00 0.0 3.2e+06 3.6e+03 8.4e+01 0 0 3 15 4 0 0 3 15 4 0
MatPtAPNumeric 12 1.0 6.0478e-01 1.0 6.99e+07 0.0 3.3e+06 3.0e+03 9.6e+01 0 7 4 13 4 0 7 4 13 4 165193
MatGetLocalMat 36 1.0 1.9234e-0255.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 36 1.0 1.0496e-01201.1 0.00e+00 0.0 4.0e+06 2.4e+03 0.0e+00 0 0 4 13 0 0 0 4 13 0 0
KSPGMRESOrthog 120 1.0 1.4184e-01 3.7 1.47e+07 0.0 0.0e+00 0.0e+00 1.2e+02 0 2 0 0 5 0 2 0 0 5 164085
KSPSetUp 45 1.0 7.5013e-02 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 3 1.0 2.1762e+00 1.0 8.36e+08 0.0 5.7e+07 6.3e+02 5.0e+02 1 85 61 47 22 1 85 61 47 22 556264
PCGAMGGraph_AGG 12 1.0 4.0398e-01 1.0 4.80e+06 0.0 1.7e+06 4.5e+02 1.4e+02 0 0 2 1 6 0 0 2 1 7 17026
PCGAMGCoarse_AGG 12 1.0 1.3612e-01 1.1 0.00e+00 0.0 1.7e+07 3.0e+02 2.5e+02 0 0 18 7 11 0 0 18 7 11 0
PCGAMGProl_AGG 12 1.0 4.2615e-01 1.0 0.00e+00 0.0 1.5e+06 5.7e+02 1.9e+02 0 0 2 1 9 0 0 2 1 9 0
PCGAMGPOpt_AGG 12 1.0 1.0610e+00 1.0 7.12e+07 0.0 9.0e+06 6.1e+02 5.0e+02 0 7 10 7 22 0 7 10 7 23 100280
GAMG: createProl 12 1.0 2.0248e+00 1.0 7.60e+07 0.0 2.9e+07 4.2e+02 1.1e+03 1 8 31 16 49 1 8 31 16 49 55945
Graph 24 1.0 3.9569e-01 1.0 4.80e+06 0.0 1.7e+06 4.5e+02 1.4e+02 0 0 2 1 6 0 0 2 1 7 17383
MIS/Agg 12 1.0 1.3554e-01 1.2 0.00e+00 0.0 1.7e+07 3.0e+02 2.5e+02 0 0 18 7 11 0 0 18 7 11 0
SA: col data 12 1.0 2.5961e-02 1.1 0.00e+00 0.0 1.1e+06 6.7e+02 4.8e+01 0 0 1 1 2 0 0 1 1 2 0
SA: frmProl0 12 1.0 3.9272e-01 1.0 0.00e+00 0.0 3.9e+05 2.7e+02 9.6e+01 0 0 0 0 4 0 0 0 0 4 0
SA: smooth 12 1.0 8.4174e-01 1.4 5.74e+06 0.0 3.3e+06 4.9e+02 1.7e+02 0 1 3 2 8 0 1 3 2 8 9996
GAMG: partLevel 12 1.0 1.5243e+00 1.0 6.99e+07 0.0 6.8e+06 3.2e+03 4.9e+02 1 7 7 28 22 1 7 7 28 22 65544
repartition 6 1.0 4.4372e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01 0 0 0 0 2 0 0 0 0 2 0
Invert-Sort 6 1.0 2.8916e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0
Move A 6 1.0 6.3890e-02 1.1 0.00e+00 0.0 1.2e+05 2.7e+02 1.0e+02 0 0 0 0 5 0 0 0 0 5 0
Move P 6 1.0 4.9374e-02 1.2 0.00e+00 0.0 1.5e+05 1.6e+01 1.0e+02 0 0 0 0 5 0 0 0 0 5 0
PCSetUp 6 1.0 3.6222e+00 1.0 1.46e+08 0.0 3.6e+07 9.5e+02 1.6e+03 1 15 38 44 74 1 15 38 44 75 58854
PCSetUpOnBlocks 36 1.0 2.1539e-02117.9 1.08e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 36 1.0 1.9656e+00 5.3 7.80e+08 0.0 5.5e+07 5.9e+02 3.8e+02 1 79 59 43 17 1 79 59 43 17 575586
SFSetGraph 12 1.0 5.8748e-06 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 5.2335e-02 2.0 0.00e+00 0.0 8.6e+05 4.5e+02 0.0e+00 0 0 1 1 0 0 0 1 1 0 0
SFBcastBegin 273 1.0 2.4409e-02186.6 0.00e+00 0.0 1.6e+07 2.9e+02 0.0e+00 0 0 17 6 0 0 0 17 6 0 0
SFBcastEnd 273 1.0 4.6278e-02920.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 573 573 1019832 0.
Matrix 348 348 1351536 0.
Matrix Coarsen 12 12 8208 0.
Index Set 222 222 418080 0.
Vec Scatter 81 81 113400 0.
Krylov Solver 45 45 415944 0.
Preconditioner 33 33 35448 0.
Viewer 5 4 3584 0.
PetscRandom 24 24 16656 0.
Star Forest Graph 12 12 11328 0.
========================================================================================================================
Average time to get PetscTime(): 4.2282e-08
Average time for MPI_Barrier(): 1.84676e-05
Average time for zero size MPI_Send(): 1.59141e-06
#PETSc Option Table entries:
--prefix run_a0b0c0d0e0f0g0h0i0_n7_l3
-aggrmeth alla_serial
-beta 10.0
-betaest .true.
-check .false.
-datadt data_distribution_fully_assembled
-dm 3
-dom -1.0
-in_space .true.
-ksp_converged_reason
-ksp_max_it 500
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.0e-6
-ksp_type cg
-ksp_view
-l 1
-levelset popcorn
-levelsettol 1.0e-6
-log_view
-lsdom 0.0
-maxl 9
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-mg_coarse_sub_pc_type cholesky
-mg_levels_esteig_ksp_type cg
-no_signal_handler
-nruns 3
-order 1
-pc_gamg_agg_nsmooths 1
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_type agg
-pc_type gamg
-petscrc /gpfs/scratch/upc26/upc26229/par_cell_aggr_poisson/paper/weak_scal_ompi/2nd-w-scal/petscrc-0
-tt 1
-uagg .true.
-wratio 10
-wsolution .false.
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/apps/INTEL/2017.4/mkl --with-debugging=0 --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
-----------------------------------------
Libraries compiled on 2018-06-04 18:55:32 on login1
Machine characteristics: Linux-4.4.103-92.56-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -wd1572 -g -O3
Using Fortran compiler: mpif90 -fPIC -g -O3
-----------------------------------------
Using include paths: -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/include -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -L/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/apps/INTEL/2017.4/mkl/lib/intel64 -L/apps/INTEL/2017.4/mkl/lib/intel64 -Wl,-rpath,/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -L/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -Wl,-rpath,/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -L/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------
Ending run at Wed Nov 7 01:15:13 CET 2018
Ending script at Wed Nov 7 01:15:13 CET 2018
-------------- next part --------------
Linear solve converged due to CONVERGED_RTOL iterations 12
KSP Object: 16777 MPI processes
type: cg
maximum iterations=500, initial guess is zero
tolerances: relative=1e-06, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 16777 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=6 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0. 0. 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 0
Number smoothing steps 1
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 16777 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 16777 MPI processes
type: bjacobi
number of blocks = 16777
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: cholesky
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqsbaij
rows=4, cols=4
package used to perform factorization: petsc
total: nonzeros=10, allocated nonzeros=10
total number of mallocs used during MatSetValues calls =0
block size is 1
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=4, cols=4
total: nonzeros=16, allocated nonzeros=16
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 1 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=4, cols=4
total: nonzeros=16, allocated nonzeros=16
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 1 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 16777 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.0999843, max = 1.09983
eigenvalues estimate via cg min 0.575611, max 0.999843
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 16777 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 16777 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=269, cols=269
total: nonzeros=46217, allocated nonzeros=46217
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 16777 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.18053, max = 1.98584
eigenvalues estimate via cg min 0.0637775, max 1.8053
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_2_esteig_) 16777 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 16777 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=18451, cols=18451
total: nonzeros=5470355, allocated nonzeros=5470355
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 16777 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.156694, max = 1.72364
eigenvalues estimate via cg min 0.0434381, max 1.56694
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_3_esteig_) 16777 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 16777 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=908791, cols=908791
total: nonzeros=191134331, allocated nonzeros=191134331
total number of mallocs used during MatSetValues calls =0
using scalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 16777 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.13616, max = 1.49776
eigenvalues estimate via cg min 0.0335059, max 1.3616
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_4_esteig_) 16777 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 16777 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=20910556, cols=20910556
total: nonzeros=1618051660, allocated nonzeros=1618051660
total number of mallocs used during MatSetValues calls =0
using scalable MatPtAP() implementation
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 5 -------------------------------
KSP Object: (mg_levels_5_) 16777 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.336389, max = 3.70028
eigenvalues estimate via cg min 0.0534122, max 3.36389
eigenvalues estimated using cg with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_5_esteig_) 16777 MPI processes
type: cg
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_5_) 16777 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=260421387, cols=260421387
total: nonzeros=7197955643, allocated nonzeros=156252832200
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 16777 MPI processes
type: mpiaij
rows=260421387, cols=260421387
total: nonzeros=7197955643, allocated nonzeros=156252832200
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/scratch/upc26/upc26229/build_rel_fempar_cell_agg_ompi/FEMPAR/bin/par_test_h_adaptive_poisson_unfitted on a arch-linux2-c-opt named s02r2b25 with 16777 processors, by upc26229 Wed Nov 7 01:31:35 2018
Using Petsc Release Version 3.9.0, Apr, 07, 2018
Max Max/Min Avg Total
Time (sec): 3.137e+02 1.00000 3.137e+02
Objects: 1.745e+03 1.00172 1.742e+03
Flop: 9.833e+08 0.00000 6.678e+08 1.120e+13
Flop/sec: 3.134e+06 0.00000 2.129e+06 3.571e+10
MPI Messages: 2.180e+05 19813.90909 5.157e+04 8.652e+08
MPI Message Lengths: 6.226e+07 1414906.81818 7.366e+02 6.374e+11
MPI Reductions: 3.011e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.1371e+02 100.0% 1.1204e+13 100.0% 8.652e+08 100.0% 7.366e+02 100.0% 2.998e+03 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 15 1.0 1.2635e-01 2.4 0.00e+00 0.0 2.3e+06 8.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSidedF 123 1.0 1.4200e+01 7.9 0.00e+00 0.0 9.0e+06 1.5e+04 0.0e+00 4 0 1 21 0 4 0 1 21 0 0
VecMDot 150 1.0 4.6170e-01 1.7 7.48e+06 0.0 0.0e+00 0.0e+00 1.5e+02 0 1 0 0 5 0 1 0 0 5 201728
VecTDot 384 1.0 3.1098e+00 4.0 5.80e+06 0.0 0.0e+00 0.0e+00 3.8e+02 0 1 0 0 13 0 1 0 0 13 23494
VecNorm 369 1.0 1.1910e+00 1.5 4.59e+06 0.0 0.0e+00 0.0e+00 3.7e+02 0 1 0 0 12 0 1 0 0 12 48340
VecScale 165 1.0 1.2617e-02212.0 7.49e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 738275
VecCopy 231 1.0 1.0534e-02327.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 864 1.0 1.9701e-02189.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 387 1.0 4.0720e-02309.4 5.80e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1794326
VecAYPX 1608 1.0 2.3452e-02168.1 1.07e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 5715622
VecAXPBYCZ 720 1.0 1.8540e-02276.4 1.63e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 10961725
VecMAXPY 165 1.0 1.5714e-02398.8 8.85e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 7005508
VecAssemblyBegin 42 1.0 4.3601e-01 1.5 0.00e+00 0.0 1.0e+06 1.8e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 42 1.0 5.7755e-03134.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 165 1.0 2.8011e-03138.9 7.49e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3325353
VecScatterBegin 1842 1.0 1.0872e-01510.7 0.00e+00 0.0 5.1e+08 6.4e+02 0.0e+00 0 0 59 51 0 0 0 59 51 0 0
VecScatterEnd 1842 1.0 2.2453e+0013155.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 15 1.0 3.1115e-031343.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 165 1.0 5.2679e-01 1.3 2.25e+06 0.0 0.0e+00 0.0e+00 1.6e+02 0 0 0 0 5 0 0 0 0 6 53045
MatMult 1416 1.0 1.9601e+002744.6 4.82e+08 0.0 4.3e+08 7.2e+02 0.0e+00 0 48 50 48 0 0 48 50 48 0 2757975
MatMultAdd 180 1.0 7.8791e-019183.2 1.33e+07 0.0 3.3e+07 1.6e+02 0.0e+00 0 1 4 1 0 0 1 4 1 0 186808
MatMultTranspose 180 1.0 1.0899e+006594.6 1.33e+07 0.0 3.3e+07 1.6e+02 0.0e+00 0 1 4 1 0 0 1 4 1 0 135049
MatSolve 36 0.0 4.2692e-05 0.0 1.01e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 24
MatSOR 1245 1.0 1.0514e+007119.8 3.46e+08 0.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 3644239
MatCholFctrSym 3 1.0 1.4223e-02755.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCholFctrNum 3 1.0 8.4996e-033106.3 1.20e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 15 1.0 4.5698e-0228.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 45 1.0 1.0334e-011406.4 5.72e+06 0.0 4.5e+06 6.9e+02 0.0e+00 0 1 1 0 0 0 1 1 0 0 641987
MatResidual 180 1.0 3.4803e-012420.0 5.74e+07 0.0 5.4e+07 6.9e+02 0.0e+00 0 6 6 6 0 0 6 6 6 0 1864527
MatAssemblyBegin 306 1.0 1.3789e+01 5.9 0.00e+00 0.0 8.0e+06 1.7e+04 0.0e+00 4 0 1 21 0 4 0 1 21 0 0
MatAssemblyEnd 306 1.0 1.3870e+01 1.0 0.00e+00 0.0 4.6e+07 1.5e+02 6.5e+02 4 0 5 1 22 4 0 5 1 22 0
MatGetRow 204147 0.0 3.1561e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 0.0 1.3432e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 18 1.0 7.7521e+00 1.0 0.00e+00 0.0 1.4e+06 2.2e+02 2.8e+02 2 0 0 0 9 2 0 0 0 9 0
MatGetOrdering 3 0.0 1.4079e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 15 1.0 1.0908e+00 1.1 0.00e+00 0.0 2.6e+08 2.4e+02 5.3e+02 0 0 30 10 18 0 0 30 10 18 0
MatZeroEntries 15 1.0 2.3229e-03397.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 27 1.3 3.1142e-01 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 1 0 0 0 0 1 0
MatAXPY 15 1.0 2.4644e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 15 1.0 6.6292e+00 1.0 4.79e+06 0.0 2.6e+07 4.9e+02 1.9e+02 2 0 3 2 6 2 0 3 2 6 8157
MatMatMultSym 15 1.0 6.2354e+00 1.0 0.00e+00 0.0 2.1e+07 4.5e+02 1.8e+02 2 0 2 2 6 2 0 2 2 6 0
MatMatMultNum 15 1.0 1.1974e-01 1.3 4.79e+06 0.0 4.5e+06 6.9e+02 0.0e+00 0 0 1 0 0 0 0 1 0 0 451599
MatPtAP 15 1.0 7.7822e+00 1.0 7.71e+07 0.0 5.5e+07 3.2e+03 2.3e+02 2 7 6 27 8 2 7 6 27 8 101415
MatPtAPSymbolic 15 1.0 4.5700e+00 1.0 0.00e+00 0.0 2.5e+07 3.7e+03 1.0e+02 1 0 3 15 3 1 0 3 15 4 0
MatPtAPNumeric 15 1.0 3.3127e+00 1.0 7.71e+07 0.0 2.9e+07 2.8e+03 1.2e+02 1 7 3 13 4 1 7 3 13 4 238246
MatGetLocalMat 45 1.0 1.9456e-0244.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 45 1.0 1.0016e-01150.0 0.00e+00 0.0 3.2e+07 2.5e+03 0.0e+00 0 0 4 12 0 0 0 4 12 0 0
KSPGMRESOrthog 150 1.0 4.6669e-01 1.7 1.50e+07 0.0 0.0e+00 0.0e+00 1.5e+02 0 2 0 0 5 0 2 0 0 5 399162
KSPSetUp 54 1.0 1.4842e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 3 1.0 3.9999e+00 1.1 8.39e+08 0.0 4.5e+08 6.4e+02 5.9e+02 1 85 52 45 20 1 85 52 45 20 2380078
PCGAMGGraph_AGG 15 1.0 6.2019e+00 1.0 4.79e+06 0.0 1.4e+07 4.6e+02 1.8e+02 2 0 2 1 6 2 0 2 1 6 8719
PCGAMGCoarse_AGG 15 1.0 1.0930e+00 1.1 0.00e+00 0.0 2.6e+08 2.4e+02 5.3e+02 0 0 30 10 18 0 0 30 10 18 0
PCGAMGProl_AGG 15 1.0 7.1075e+00 1.0 0.00e+00 0.0 1.2e+07 5.8e+02 2.4e+02 2 0 1 1 8 2 0 1 1 8 0
PCGAMGPOpt_AGG 15 1.0 1.1478e+01 1.0 7.11e+07 0.0 7.1e+07 6.2e+02 6.2e+02 4 8 8 7 21 4 8 8 7 21 73252
GAMG: createProl 15 1.0 2.5825e+01 1.0 7.59e+07 0.0 3.5e+08 3.4e+02 1.6e+03 8 8 41 19 52 8 8 41 19 53 34652
Graph 30 1.0 6.1862e+00 1.0 4.79e+06 0.0 1.4e+07 4.6e+02 1.8e+02 2 0 2 1 6 2 0 2 1 6 8741
MIS/Agg 15 1.0 1.0910e+00 1.1 0.00e+00 0.0 2.6e+08 2.4e+02 5.3e+02 0 0 30 10 18 0 0 30 10 18 0
SA: col data 15 1.0 2.3531e+00 1.0 0.00e+00 0.0 9.0e+06 6.9e+02 6.0e+01 1 0 1 1 2 1 0 1 1 2 0
SA: frmProl0 15 1.0 2.8294e+00 1.0 0.00e+00 0.0 3.2e+06 2.7e+02 1.2e+02 1 0 0 0 4 1 0 0 0 4 0
SA: smooth 15 1.0 7.7540e+00 1.0 5.72e+06 0.0 2.6e+07 4.9e+02 2.2e+02 2 1 3 2 7 2 1 3 2 7 8556
GAMG: partLevel 15 1.0 1.9884e+01 1.0 7.71e+07 0.0 5.6e+07 3.1e+03 6.8e+02 6 7 6 28 23 6 7 6 28 23 39691
repartition 9 1.0 1.2793e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.4e+01 0 0 0 0 2 0 0 0 0 2 0
Invert-Sort 9 1.0 1.6471e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01 1 0 0 0 1 1 0 0 0 1 0
Move A 9 1.0 3.9110e+00 1.0 0.00e+00 0.0 5.1e+05 5.6e+02 1.5e+02 1 0 0 0 5 1 0 0 0 5 0
Move P 9 1.0 3.9752e+00 1.0 0.00e+00 0.0 8.8e+05 2.1e+01 1.5e+02 1 0 0 0 5 1 0 0 0 5 0
PCSetUp 6 1.0 4.7888e+01 1.0 1.51e+08 0.0 4.1e+08 7.2e+02 2.3e+03 15 15 47 46 78 15 15 47 46 78 35168
PCSetUpOnBlocks 36 1.0 2.2114e-0248.6 1.20e+01 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 36 1.0 3.2792e+00 2.8 7.84e+08 0.0 4.4e+08 6.1e+02 4.8e+02 1 79 50 41 16 1 79 50 41 16 2713702
SFSetGraph 15 1.0 1.1425e-05 9.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 15 1.0 1.4317e-01 1.9 0.00e+00 0.0 6.8e+06 4.6e+02 0.0e+00 0 0 1 0 0 0 0 1 0 0 0
SFBcastBegin 564 1.0 3.8685e-02183.9 0.00e+00 0.0 2.5e+08 2.4e+02 0.0e+00 0 0 29 9 0 0 0 29 9 0 0
SFBcastEnd 564 1.0 3.0283e-013088.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 735 735 1523448 0.
Matrix 444 444 11007960 0.
Matrix Coarsen 15 15 10260 0.
Index Set 303 303 2069856 0.
Vec Scatter 105 105 147192 0.
Krylov Solver 54 54 516960 0.
Preconditioner 39 39 41352 0.
Viewer 5 4 3584 0.
PetscRandom 30 30 20820 0.
Star Forest Graph 15 15 14160 0.
========================================================================================================================
Average time to get PetscTime(): 4.25614e-08
Average time for MPI_Barrier(): 0.00116431
Average time for zero size MPI_Send(): 1.79813e-06
#PETSc Option Table entries:
--prefix run_a0b0c0d0e0f0g0h0i0_n8_l3
-aggrmeth alla_serial
-beta 10.0
-betaest .true.
-check .false.
-datadt data_distribution_fully_assembled
-dm 3
-dom -1.0
-in_space .true.
-ksp_converged_reason
-ksp_max_it 500
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.0e-6
-ksp_type cg
-ksp_view
-l 1
-levelset popcorn
-levelsettol 1.0e-6
-log_view
-lsdom 0.0
-maxl 10
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-mg_coarse_sub_pc_type cholesky
-mg_levels_esteig_ksp_type cg
-no_signal_handler
-nruns 3
-order 1
-pc_gamg_agg_nsmooths 1
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_type agg
-pc_type gamg
-petscrc /gpfs/scratch/upc26/upc26229/par_cell_aggr_poisson/paper/weak_scal_ompi/2nd-w-scal/petscrc-0
-tt 1
-uagg .true.
-wratio 10
-wsolution .false.
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/apps/INTEL/2017.4/mkl --with-debugging=0 --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
-----------------------------------------
Libraries compiled on 2018-06-04 18:55:32 on login1
Machine characteristics: Linux-4.4.103-92.56-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -wd1572 -g -O3
Using Fortran compiler: mpif90 -fPIC -g -O3
-----------------------------------------
Using include paths: -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/include -I/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -L/gpfs/scratch/upc26/upc26229/petsc_cell_agg_openmpi/release/petsc-3.9.0/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/apps/INTEL/2017.4/mkl/lib/intel64 -L/apps/INTEL/2017.4/mkl/lib/intel64 -Wl,-rpath,/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -L/usr/mpi/intel/openmpi-1.10.4-hfi/lib64 -Wl,-rpath,/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -L/gpfs/apps/MN4/INTEL/2018.0.128/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------
Ending run at Wed Nov 7 01:31:37 CET 2018
Ending script at Wed Nov 7 01:31:37 CET 2018
More information about the petsc-users
mailing list