[petsc-users] GAMG for the unsymmetrical matrix
Kong, Fande
fande.kong at inl.gov
Fri Apr 7 16:29:47 CDT 2017
Thanks, Barry.
It works.
GAMG is three times better than ASM in terms of the number of linear
iterations, but it is five times slower than ASM. Any suggestions to
improve the performance of GAMG? Log files are attached.
Fande,
On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> > Thanks, Mark and Barry,
> >
> > It works pretty wells in terms of the number of linear iterations (using
> "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
> using the two-level method via "-pc_mg_levels 2". The reason why the
> compute time is larger than other preconditioning options is that a matrix
> free method is used in the fine level and in my particular problem the
> function evaluation is expensive.
> >
> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
> but I do not think I want to make the preconditioning part matrix-free. Do
> you guys know how to turn off the matrix-free method for GAMG?
>
> -pc_use_amat false
>
> >
> > Here is the detailed solver:
> >
> > SNES Object: 384 MPI processes
> > type: newtonls
> > maximum iterations=200, maximum function evaluations=10000
> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> > total number of linear solver iterations=20
> > total number of function evaluations=166
> > norm schedule ALWAYS
> > SNESLineSearch Object: 384 MPI processes
> > type: bt
> > interpolation: cubic
> > alpha=1.000000e-04
> > maxstep=1.000000e+08, minlambda=1.000000e-12
> > tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> > maximum iterations=40
> > KSP Object: 384 MPI processes
> > type: gmres
> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > GMRES: happy breakdown tolerance 1e-30
> > maximum iterations=100, initial guess is zero
> > tolerances: relative=0.001, absolute=1e-50, divergence=10000.
> > right preconditioning
> > using UNPRECONDITIONED norm type for convergence test
> > PC Object: 384 MPI processes
> > type: gamg
> > MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > Cycles per PCApply=1
> > Using Galerkin computed coarse grid matrices
> > GAMG specific options
> > Threshold for dropping small values from graph 0.
> > AGG specific options
> > Symmetric graph true
> > Coarse grid solver -- level -------------------------------
> > KSP Object: (mg_coarse_) 384 MPI processes
> > type: preonly
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using NONE norm type for convergence test
> > PC Object: (mg_coarse_) 384 MPI processes
> > type: bjacobi
> > block Jacobi: number of blocks = 384
> > Local solve is same for all blocks, in the following KSP and
> PC objects:
> > KSP Object: (mg_coarse_sub_) 1 MPI processes
> > type: preonly
> > maximum iterations=1, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using NONE norm type for convergence test
> > PC Object: (mg_coarse_sub_) 1 MPI processes
> > type: lu
> > LU: out-of-place factorization
> > tolerance for zero pivot 2.22045e-14
> > using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> > matrix ordering: nd
> > factor fill ratio given 5., needed 1.31367
> > Factored matrix follows:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=37, cols=37
> > package used to perform factorization: petsc
> > total: nonzeros=913, allocated nonzeros=913
> > total number of mallocs used during MatSetValues calls
> =0
> > not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=37, cols=37
> > total: nonzeros=695, allocated nonzeros=695
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 384 MPI processes
> > type: mpiaij
> > rows=18145, cols=18145
> > total: nonzeros=1709115, allocated nonzeros=1709115
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> > Down solver (pre-smoother) on level 1 ------------------------------
> -
> > KSP Object: (mg_levels_1_) 384 MPI processes
> > type: chebyshev
> > Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673
> > Chebyshev: eigenvalues estimated using gmres with
> translations [0. 0.1; 0. 1.1]
> > KSP Object: (mg_levels_1_esteig_) 384 MPI
> processes
> > type: gmres
> > GMRES: restart=30, using Classical (unmodified)
> Gram-Schmidt Orthogonalization with no iterative refinement
> > GMRES: happy breakdown tolerance 1e-30
> > maximum iterations=10, initial guess is zero
> > tolerances: relative=1e-12, absolute=1e-50,
> divergence=10000.
> > left preconditioning
> > using PRECONDITIONED norm type for convergence test
> > maximum iterations=2
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using nonzero initial guess
> > using NONE norm type for convergence test
> > PC Object: (mg_levels_1_) 384 MPI processes
> > type: sor
> > SOR: type = local_symmetric, iterations = 1, local iterations
> = 1, omega = 1.
> > linear system matrix followed by preconditioner matrix:
> > Mat Object: 384 MPI processes
> > type: mffd
> > rows=3020875, cols=3020875
> > Matrix-free approximation:
> > err=1.49012e-08 (relative error in function evaluation)
> > Using wp compute h routine
> > Does not compute normU
> > Mat Object: () 384 MPI processes
> > type: mpiaij
> > rows=3020875, cols=3020875
> > total: nonzeros=215671710, allocated nonzeros=241731750
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> > Up solver (post-smoother) same as down solver (pre-smoother)
> > linear system matrix followed by preconditioner matrix:
> > Mat Object: 384 MPI processes
> > type: mffd
> > rows=3020875, cols=3020875
> > Matrix-free approximation:
> > err=1.49012e-08 (relative error in function evaluation)
> > Using wp compute h routine
> > Does not compute normU
> > Mat Object: () 384 MPI processes
> > type: mpiaij
> > rows=3020875, cols=3020875
> > total: nonzeros=215671710, allocated nonzeros=241731750
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> >
> > Fande,
> >
> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >> Does this mean that GAMG works for the symmetrical matrix only?
> > >
> > > No, it means that for non symmetric nonzero structure you need the
> extra flag. So use the extra flag. The reason we don't always use the flag
> is because it adds extra cost and isn't needed if the matrix already has a
> symmetric nonzero structure.
> >
> > BTW, if you have symmetric non-zero structure you can just set
> > -pc_gamg_threshold -1.0', note the "or" in the message.
> >
> > If you want to mess with the threshold then you need to use the
> > symmetrized flag.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170407/61aaa18d/attachment-0001.html>
-------------- next part --------------
Time Step 10, time = 0.1
dt = 0.01
0 Nonlinear |R| = 2.004779e-03
0 Linear |R| = 2.004779e-03
1 Linear |R| = 1.080152e-03
2 Linear |R| = 5.066679e-04
3 Linear |R| = 3.045271e-04
4 Linear |R| = 1.925133e-04
5 Linear |R| = 1.404396e-04
6 Linear |R| = 1.087962e-04
7 Linear |R| = 9.433190e-05
8 Linear |R| = 8.650164e-05
9 Linear |R| = 7.511298e-05
10 Linear |R| = 6.116103e-05
11 Linear |R| = 5.097880e-05
12 Linear |R| = 4.528093e-05
13 Linear |R| = 4.238188e-05
14 Linear |R| = 3.852598e-05
15 Linear |R| = 3.211727e-05
16 Linear |R| = 2.655089e-05
17 Linear |R| = 2.308499e-05
18 Linear |R| = 1.988423e-05
19 Linear |R| = 1.686685e-05
20 Linear |R| = 1.453042e-05
21 Linear |R| = 1.227912e-05
22 Linear |R| = 9.829701e-06
23 Linear |R| = 7.695993e-06
24 Linear |R| = 6.092649e-06
25 Linear |R| = 5.293533e-06
26 Linear |R| = 4.583670e-06
27 Linear |R| = 3.427266e-06
28 Linear |R| = 2.442730e-06
29 Linear |R| = 1.855485e-06
1 Nonlinear |R| = 1.855485e-06
0 Linear |R| = 1.855485e-06
1 Linear |R| = 1.626392e-06
2 Linear |R| = 1.505583e-06
3 Linear |R| = 1.258325e-06
4 Linear |R| = 8.295100e-07
5 Linear |R| = 6.184171e-07
6 Linear |R| = 5.114149e-07
7 Linear |R| = 4.146942e-07
8 Linear |R| = 3.335395e-07
9 Linear |R| = 2.647491e-07
10 Linear |R| = 2.099801e-07
11 Linear |R| = 1.774148e-07
12 Linear |R| = 1.508766e-07
13 Linear |R| = 1.214361e-07
14 Linear |R| = 1.009707e-07
15 Linear |R| = 9.148193e-08
16 Linear |R| = 8.608036e-08
17 Linear |R| = 7.997930e-08
18 Linear |R| = 7.004223e-08
19 Linear |R| = 5.671891e-08
20 Linear |R| = 4.909039e-08
21 Linear |R| = 4.690188e-08
22 Linear |R| = 4.309895e-08
23 Linear |R| = 3.325854e-08
24 Linear |R| = 2.375529e-08
25 Linear |R| = 1.690025e-08
26 Linear |R| = 1.237871e-08
27 Linear |R| = 8.720643e-09
28 Linear |R| = 5.961891e-09
29 Linear |R| = 4.283073e-09
30 Linear |R| = 3.126338e-09
31 Linear |R| = 2.185008e-09
32 Linear |R| = 1.411854e-09
2 Nonlinear |R| = 1.411854e-09
SNES Object: 384 MPI processes
type: newtonls
maximum iterations=200, maximum function evaluations=10000
tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
total number of linear solver iterations=61
total number of function evaluations=66
norm schedule ALWAYS
SNESLineSearch Object: 384 MPI processes
type: bt
interpolation: cubic
alpha=1.000000e-04
maxstep=1.000000e+08, minlambda=1.000000e-12
tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
maximum iterations=40
KSP Object: 384 MPI processes
type: gmres
GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=100, initial guess is zero
tolerances: relative=0.001, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 384 MPI processes
type: asm
Additive Schwarz: total subdomain blocks = 384, amount of overlap = 1
Additive Schwarz: restriction/interpolation type - RESTRICT
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (sub_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (sub_) 1 MPI processes
type: ilu
ILU: out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=20493, cols=20493
package used to perform factorization: petsc
total: nonzeros=1270950, allocated nonzeros=1270950
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: () 1 MPI processes
type: seqaij
rows=20493, cols=20493
total: nonzeros=1270950, allocated nonzeros=1270950
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix followed by preconditioner matrix:
Mat Object: 384 MPI processes
type: mffd
rows=3020875, cols=3020875
Matrix-free approximation:
err=1.49012e-08 (relative error in function evaluation)
Using wp compute h routine
Does not compute normU
Mat Object: () 384 MPI processes
type: mpiaij
rows=3020875, cols=3020875
total: nonzeros=215671710, allocated nonzeros=241731750
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Solve Converged!
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i0n1 with 384 processors, by kongf Tue Mar 14 16:28:04 2017
Using Petsc Release Version 3.7.5, unknown
Max Max/Min Avg Total
Time (sec): 4.387e+02 1.00001 4.387e+02
Objects: 1.279e+03 1.00000 1.279e+03
Flops: 4.230e+09 1.99161 2.946e+09 1.131e+12
Flops/sec: 9.642e+06 1.99162 6.716e+06 2.579e+09
MPI Messages: 2.935e+05 4.95428 1.810e+05 6.951e+07
MPI Message Lengths: 3.105e+09 3.16103 1.072e+04 7.449e+11
MPI Reductions: 5.022e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.3875e+02 100.0% 1.1314e+12 100.0% 6.951e+07 100.0% 1.072e+04 100.0% 5.022e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 20 1.0 3.2134e-03 2.4 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 37601
VecMDot 839 1.0 6.7209e-01 1.2 3.52e+08 2.3 0.0e+00 0.0e+00 8.4e+02 0 8 0 0 2 0 8 0 0 2 139634
VecNorm 1802 1.0 6.7932e+00 2.5 4.08e+07 2.3 0.0e+00 0.0e+00 1.8e+03 1 1 0 0 4 1 1 0 0 4 1603
VecScale 3877 1.0 1.0508e-01 1.4 1.34e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 439546
VecCopy 4153 1.0 7.2803e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 5493 1.0 5.1735e-01 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 5365 1.0 4.0282e-01 2.3 3.01e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 251646
VecWAXPY 884 1.0 5.5227e-02 3.5 1.97e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 95341
VecMAXPY 864 1.0 1.7126e-01 2.6 3.71e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 577621
VecAssemblyBegin 15491 1.0 1.3738e+02 3.0 0.00e+00 0.0 8.9e+06 1.8e+04 4.6e+04 28 0 13 22 93 28 0 13 22 93 0
VecAssemblyEnd 15491 1.0 7.9072e-0128.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 13390 1.0 2.5097e+00 3.6 0.00e+00 0.0 5.9e+07 8.4e+03 2.8e+01 0 0 85 67 0 0 0 85 67 0 0
VecScatterEnd 13362 1.0 5.7428e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecReduceArith 55 1.0 1.2808e-03 2.2 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 259431
VecReduceComm 25 1.0 5.5003e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 864 1.0 4.4664e+00 3.5 2.93e+07 2.3 0.0e+00 0.0e+00 8.6e+02 1 1 0 0 2 1 1 0 0 2 1753
MatMult MF 859 1.0 3.1339e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83 71 12 81 73 83 439
MatMult 859 1.0 3.1340e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83 71 12 81 73 83 439
MatSolve 864 1.0 2.1255e+00 2.0 1.83e+09 2.1 0.0e+00 0.0e+00 0.0e+00 0 43 0 0 0 0 43 0 0 0 226791
MatLUFactorNum 25 1.0 1.0920e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 267745
MatILUFactorSym 13 1.0 1.0606e-01 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 150 1.0 2.0643e+00 1.2 0.00e+00 0.0 1.7e+05 1.7e+05 2.0e+02 0 0 0 4 0 0 0 0 4 0 0
MatAssemblyEnd 150 1.0 4.3198e+00 1.1 0.00e+00 0.0 1.9e+04 1.1e+03 2.1e+02 1 0 0 0 0 1 0 0 0 0 0
MatGetRowIJ 13 1.0 1.3113e-0513.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 25 1.0 4.4022e+00 2.8 0.00e+00 0.0 5.9e+05 8.4e+04 7.5e+01 1 0 1 7 0 1 0 1 7 0 0
MatGetOrdering 13 1.0 1.7283e-0217.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 13 1.0 2.0244e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 29 1.0 5.0908e-02 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 52 2.0 5.5351e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 13 1.0 3.7214e+02 1.0 4.21e+09 2.0 6.6e+07 1.0e+04 4.8e+04 85100 95 92 95 85100 95 92 95 3026
SNESFunctionEval 897 1.0 3.2606e+02 1.0 3.62e+08 1.3 5.9e+07 9.6e+03 4.3e+04 74 11 85 76 85 74 11 85 76 85 384
SNESJacobianEval 25 1.0 3.4770e+01 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03 8 1 3 7 4 8 1 3 7 4 195
SNESLineSearch 25 1.0 1.8090e+01 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03 4 1 4 4 5 4 1 4 4 5 475
BuildTwoSided 25 1.0 4.6378e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 25 1.0 2.7061e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 25 1.0 4.6412e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 25 1.0 8.1301e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 839 1.0 8.0119e-01 1.2 7.03e+08 2.3 0.0e+00 0.0e+00 8.4e+02 0 17 0 0 2 0 17 0 0 2 234277
KSPSetUp 50 1.0 3.0220e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 25 1.0 3.1444e+02 1.0 4.16e+09 2.0 6.0e+07 9.9e+03 4.3e+04 72 98 86 80 85 72 98 86 80 85 3526
PCSetUp 50 1.0 5.4896e+00 2.4 1.20e+09 2.5 7.1e+05 7.0e+04 1.8e+02 1 26 1 7 0 1 26 1 7 0 53260
PCSetUpOnBlocks 25 1.0 1.1928e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 245124
PCApply 864 1.0 2.4803e+00 2.0 1.83e+09 2.1 4.1e+06 4.4e+03 0.0e+00 0 43 6 2 0 0 43 6 2 0 194354
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 740 740 732012968 0.
Vector Scatter 76 76 1212680 0.
Index Set 176 176 4673716 0.
IS L to G Mapping 33 33 3228828 0.
MatMFFD 13 13 10088 0.
Matrix 45 45 364469360 0.
SNES 13 13 17316 0.
SNESLineSearch 13 13 12896 0.
DMSNES 13 13 8632 0.
Distributed Mesh 13 13 60320 0.
Star Forest Bipartite Graph 51 51 43248 0.
Discrete System 13 13 11232 0.
Krylov Solver 26 26 2223520 0.
DMKSP interface 13 13 8424 0.
Preconditioner 26 26 25688 0.
Viewer 15 13 10816 0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 2.08554e-06
#PETSc Option Table entries:
--n-threads=1
-i treat-cube_transient.i
-ksp_gmres_restart 100
-log_view
-pc_hypre_boomeramg_max_iter 4
-pc_hypre_boomeramg_strong_threshold 0.7
-pc_hypre_boomeramg_tol 1.0e-6
-pc_hypre_type boomeramg
-pc_type asm
-snes_mf_operator
-snes_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1
-----------------------------------------
Libraries compiled on Tue Feb 7 16:47:41 2017 on falcon1
Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /home/kongf/workhome/projects/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -fopenmp -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -fopenmp -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
Time Step 10, time = 0.1
dt = 0.01
0 Nonlinear |R| = 2.004778e-03
0 Linear |R| = 2.004778e-03
1 Linear |R| = 4.440581e-04
2 Linear |R| = 1.283930e-04
3 Linear |R| = 9.874954e-05
4 Linear |R| = 6.589984e-05
5 Linear |R| = 4.483411e-05
6 Linear |R| = 2.787575e-05
7 Linear |R| = 1.435839e-05
8 Linear |R| = 8.720579e-06
9 Linear |R| = 3.704796e-06
10 Linear |R| = 2.317054e-06
11 Linear |R| = 9.060942e-07
1 Nonlinear |R| = 9.060942e-07
0 Linear |R| = 9.060942e-07
1 Linear |R| = 6.874101e-07
2 Linear |R| = 3.052995e-07
3 Linear |R| = 1.728171e-07
4 Linear |R| = 7.805237e-08
5 Linear |R| = 5.011253e-08
6 Linear |R| = 2.903814e-08
7 Linear |R| = 2.421108e-08
8 Linear |R| = 1.594860e-08
9 Linear |R| = 1.116189e-08
10 Linear |R| = 4.372907e-09
11 Linear |R| = 1.575997e-09
12 Linear |R| = 5.765413e-10
2 Nonlinear |R| = 5.765413e-10
SNES Object: 384 MPI processes
type: newtonls
maximum iterations=200, maximum function evaluations=10000
tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
total number of linear solver iterations=23
total number of function evaluations=28
norm schedule ALWAYS
SNESLineSearch Object: 384 MPI processes
type: bt
interpolation: cubic
alpha=1.000000e-04
maxstep=1.000000e+08, minlambda=1.000000e-12
tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
maximum iterations=40
KSP Object: 384 MPI processes
type: gmres
GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=100, initial guess is zero
tolerances: relative=0.001, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 384 MPI processes
type: gamg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
GAMG specific options
Threshold for dropping small values from graph 0.
AGG specific options
Symmetric graph true
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 384 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 384 MPI processes
type: bjacobi
block Jacobi: number of blocks = 384
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 1.31367
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=37, cols=37
package used to perform factorization: petsc
total: nonzeros=913, allocated nonzeros=913
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=37, cols=37
total: nonzeros=695, allocated nonzeros=695
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 384 MPI processes
type: mpiaij
rows=18145, cols=18145
total: nonzeros=1709115, allocated nonzeros=1709115
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 384 MPI processes
type: chebyshev
Chebyshev: eigenvalue estimates: min = 0.138116, max = 1.51927
Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 384 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 384 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: () 384 MPI processes
type: mpiaij
rows=3020875, cols=3020875
total: nonzeros=215671710, allocated nonzeros=241731750
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix followed by preconditioner matrix:
Mat Object: 384 MPI processes
type: mffd
rows=3020875, cols=3020875
Matrix-free approximation:
err=1.49012e-08 (relative error in function evaluation)
Using wp compute h routine
Does not compute normU
Mat Object: () 384 MPI processes
type: mpiaij
rows=3020875, cols=3020875
total: nonzeros=215671710, allocated nonzeros=241731750
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Solve Converged!
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i4n2 with 384 processors, by kongf Fri Apr 7 13:36:35 2017
Using Petsc Release Version 3.7.5, unknown
Max Max/Min Avg Total
Time (sec): 2.266e+03 1.00001 2.266e+03
Objects: 6.020e+03 1.00000 6.020e+03
Flops: 1.064e+10 2.27050 7.337e+09 2.817e+12
Flops/sec: 4.695e+06 2.27050 3.237e+06 1.243e+09
MPI Messages: 3.459e+05 5.11666 2.112e+05 8.111e+07
MPI Message Lengths: 3.248e+09 3.35280 9.453e+03 7.667e+11
MPI Reductions: 4.610e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.2663e+03 100.0% 2.8172e+12 100.0% 8.111e+07 100.0% 9.453e+03 100.0% 4.610e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 20 1.0 6.1171e-01 1.6 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 198
VecMDot 1091 1.0 3.4823e+01 1.7 1.05e+08 2.3 0.0e+00 0.0e+00 1.1e+03 1 1 0 0 2 1 1 0 0 2 803
VecNorm 1943 1.0 6.9656e+01 1.6 3.66e+07 2.3 0.0e+00 0.0e+00 1.9e+03 3 0 0 0 4 3 0 0 0 4 140
VecScale 2928 1.0 1.1091e-01 2.8 7.24e+07 1.4 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 219463
VecCopy 3086 1.0 6.0201e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 7168 1.0 4.2314e-01 7.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 3263 1.0 3.7908e-01 4.1 1.59e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 138504
VecAYPX 4112 1.0 1.1982e-01 4.2 3.59e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 80071
VecAXPBYCZ 2056 1.0 7.5538e-02 3.3 7.18e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 254030
VecWAXPY 743 1.0 7.8864e-02 4.9 1.65e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55963
VecMAXPY 1196 1.0 7.9660e-02 3.3 1.23e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 411137
VecAssemblyBegin 12333 1.0 1.1090e+03 1.2 0.00e+00 0.0 7.6e+06 1.9e+04 3.7e+04 48 0 9 19 80 48 0 9 19 80 0
VecAssemblyEnd 12333 1.0 4.2957e-0124.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 440 1.0 2.2301e-02 5.7 3.12e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37433
VecScatterBegin 13638 1.0 2.3693e+00 4.9 0.00e+00 0.0 6.4e+07 5.6e+03 2.8e+01 0 0 79 46 0 0 0 79 46 0 0
VecScatterEnd 13610 1.0 2.1648e+0213.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecSetRandom 40 1.0 4.5372e-02 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 55 1.0 1.3552e-03 2.7 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 245191
VecReduceComm 25 1.0 2.3911e+00 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 1196 1.0 2.8596e+01 1.1 2.95e+07 2.3 0.0e+00 0.0e+00 1.2e+03 1 0 0 0 3 1 0 0 0 3 275
MatMult MF 718 1.0 1.4078e+03 1.0 2.00e+08 1.4 4.2e+07 8.2e+03 3.2e+04 62 2 52 45 69 62 2 52 45 69 46
MatMult 4195 1.0 1.4272e+03 1.0 3.33e+09 2.2 5.8e+07 6.6e+03 3.2e+04 63 32 72 50 69 63 32 72 50 69 627
MatMultAdd 514 1.0 9.7981e+0016.1 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00 0 0 2 0 0 0 0 2 0 0 995
MatMultTranspose 514 1.0 6.0183e+0019.9 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00 0 0 2 0 0 0 0 2 0 0 1620
MatSolve 316 1.3 1.7905e-0219.7 1.76e+06 4.6 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 18236
MatSOR 3524 1.0 6.6987e+00 3.9 2.50e+09 2.6 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 0 23 0 0 0 97291
MatLUFactorSym 25 1.0 1.7944e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 25 1.0 2.2082e-03 6.0 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 136111
MatConvert 40 1.0 2.6915e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 120 1.0 1.0204e+0022.5 3.86e+07 2.3 1.9e+05 2.9e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 10018
MatResidual 514 1.0 5.3226e+01 1.1 4.35e+08 2.3 3.7e+06 4.2e+03 1.1e+03 2 4 5 2 2 2 4 5 2 2 2165
MatAssemblyBegin 1010 1.0 6.0257e+01 2.2 0.00e+00 0.0 1.7e+06 3.5e+04 8.4e+02 2 0 2 8 2 2 0 2 8 2 0
MatAssemblyEnd 1010 1.0 7.7316e+01 1.0 0.00e+00 0.0 2.5e+06 4.6e+02 2.1e+03 3 0 3 0 5 3 0 3 0 5 0
MatGetRow 1078194 2.3 2.4485e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 25 1.2 3.7956e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrix 30 1.0 1.6949e+01 1.0 0.00e+00 0.0 1.2e+05 2.8e+02 5.1e+02 1 0 0 0 1 1 0 0 0 1 0
MatGetOrdering 25 1.2 1.8878e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 40 1.0 1.5944e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02 1 0 3 1 1 1 0 3 1 1 0
MatZeroEntries 69 1.0 7.3145e-02 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 90 1.4 1.1229e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+01 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 40 1.0 3.4301e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 20 1.0 1.2561e+01 1.1 0.00e+00 0.0 7.1e+05 2.0e+04 2.4e+02 1 0 1 2 1 1 0 1 2 1 0
MatMatMult 40 1.0 2.6365e+01 1.0 3.56e+07 2.3 1.2e+06 1.4e+03 6.4e+02 1 0 1 0 1 1 0 1 0 1 358
MatMatMultSym 40 1.0 2.3430e+01 1.0 0.00e+00 0.0 9.8e+05 1.1e+03 5.6e+02 1 0 1 0 1 1 0 1 0 1 0
MatMatMultNum 40 1.0 2.9809e+00 1.1 3.56e+07 2.3 1.9e+05 2.9e+03 8.0e+01 0 0 0 0 0 0 0 0 0 0 3170
MatPtAP 40 1.0 3.1763e+01 1.0 2.59e+08 2.3 2.7e+06 2.6e+03 6.8e+02 1 2 3 1 1 1 2 3 1 1 2012
MatPtAPSymbolic 40 1.0 1.7240e+01 1.1 0.00e+00 0.0 1.2e+06 4.6e+03 2.8e+02 1 0 1 1 1 1 0 1 1 1 0
MatPtAPNumeric 40 1.0 1.5004e+01 1.1 2.59e+08 2.3 1.5e+06 1.0e+03 4.0e+02 1 2 2 0 1 1 2 2 0 1 4259
MatTrnMatMult 25 1.0 1.1522e+02 1.0 4.05e+09 2.3 7.5e+05 2.6e+05 4.8e+02 5 37 1 25 1 5 37 1 25 1 9105
MatTrnMatMultSym 25 1.0 7.3735e+01 1.0 0.00e+00 0.0 6.3e+05 1.0e+05 4.2e+02 3 0 1 8 1 3 0 1 8 1 0
MatTrnMatMultNum 25 1.0 4.1508e+01 1.0 4.05e+09 2.3 1.2e+05 1.1e+06 5.0e+01 2 37 0 17 0 2 37 0 17 0 25275
MatGetLocalMat 170 1.0 6.0506e-01 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 120 1.0 3.7906e+00 5.3 0.00e+00 0.0 1.3e+06 5.0e+03 0.0e+00 0 0 2 1 0 0 0 2 1 0 0
SNESSolve 13 1.0 1.9975e+03 1.0 1.06e+10 2.3 7.8e+07 9.1e+03 4.3e+04 88100 96 92 94 88100 96 92 94 1408
SNESFunctionEval 756 1.0 1.4539e+03 1.0 1.62e+08 1.4 4.4e+07 8.3e+03 3.3e+04 64 2 55 48 71 64 2 55 48 71 38
SNESJacobianEval 25 1.0 1.0415e+02 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03 5 0 3 7 4 5 0 3 7 4 65
SNESLineSearch 25 1.0 1.0113e+02 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03 4 0 4 4 5 4 0 4 4 5 85
BuildTwoSided 85 1.0 5.0838e+00 1.5 0.00e+00 0.0 1.5e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 85 1.0 3.2002e-02 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFBcastBegin 382 1.0 3.1338e+00 1.4 0.00e+00 0.0 2.6e+06 2.3e+03 0.0e+00 0 0 3 1 0 0 0 3 1 0 0
SFBcastEnd 382 1.0 5.2611e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 45 1.0 2.5858e+00 1.5 0.00e+00 0.0 2.4e+05 1.8e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 45 1.0 3.6487e-01253.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 1091 1.0 3.4858e+01 1.7 2.09e+08 2.3 0.0e+00 0.0e+00 1.1e+03 1 2 0 0 2 1 2 0 0 2 1604
KSPSetUp 195 1.0 2.9202e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 25 1.0 1.7661e+03 1.0 1.06e+10 2.3 7.2e+07 8.6e+03 3.9e+04 78 99 88 80 84 78 99 88 80 84 1582
PCGAMGGraph_AGG 40 1.0 3.5930e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 263
PCGAMGCoarse_AGG 40 1.0 1.4450e+02 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03 6 37 5 27 3 6 37 5 27 3 7260
PCGAMGProl_AGG 40 1.0 3.2209e+01 1.0 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02 1 0 1 0 2 1 0 1 0 2 0
PCGAMGPOpt_AGG 40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 3 4 4 1 4 3 4 4 1 4 1987
GAMG: createProl 40 1.0 2.7631e+02 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 12 42 12 32 10 12 42 12 32 10 4286
Graph 80 1.0 3.5926e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 263
MIS/Agg 40 1.0 1.5945e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02 1 0 3 1 1 1 0 3 1 1 0
SA: col data 40 1.0 1.3401e+01 1.1 0.00e+00 0.0 4.2e+05 6.1e+03 4.0e+02 1 0 1 0 1 1 0 1 0 1 0
SA: frmProl0 40 1.0 1.4033e+01 1.1 0.00e+00 0.0 5.6e+05 4.6e+02 4.0e+02 1 0 1 0 1 1 0 1 0 1 0
SA: smooth 40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 3 4 4 1 4 3 4 4 1 4 1987
GAMG: partLevel 40 1.0 5.8738e+01 1.0 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03 3 2 4 1 3 3 2 4 1 3 1088
repartition 35 1.0 3.3741e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+01 0 0 0 0 0 0 0 0 0 0 0
Invert-Sort 15 1.0 2.7445e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 0 0 0 0 0 0 0
Move A 15 1.0 9.3221e+00 1.0 0.00e+00 0.0 6.6e+04 4.9e+02 2.7e+02 0 0 0 0 1 0 0 0 0 1 0
Move P 15 1.0 8.7196e+00 1.0 0.00e+00 0.0 5.7e+04 3.6e+01 2.7e+02 0 0 0 0 1 0 0 0 0 1 0
PCSetUp 50 1.0 3.4248e+02 1.0 4.81e+09 2.3 1.2e+07 2.0e+04 6.5e+03 15 44 15 33 14 15 44 15 33 14 3645
PCSetUpOnBlocks 316 1.0 2.1314e-02 6.3 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14102
PCApply 316 1.0 7.8870e+02 1.0 5.52e+09 2.4 4.0e+07 4.4e+03 1.7e+04 34 52 49 23 37 34 52 49 23 37 1863
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 2951 2951 828338752 0.
Vector Scatter 353 353 367264 0.
Index Set 833 833 6198336 0.
IS L to G Mapping 33 33 3228828 0.
MatMFFD 13 13 10088 0.
Matrix 1334 1334 3083683516 0.
Matrix Coarsen 40 40 25120 0.
SNES 13 13 17316 0.
SNESLineSearch 13 13 12896 0.
DMSNES 13 13 8632 0.
Distributed Mesh 13 13 60320 0.
Star Forest Bipartite Graph 111 111 94128 0.
Discrete System 13 13 11232 0.
Krylov Solver 123 123 4660776 0.
DMKSP interface 13 13 8424 0.
Preconditioner 123 123 117692 0.
PetscRandom 13 13 8294 0.
Viewer 15 13 10816 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.0217308
Average time for zero size MPI_Send(): 0.000133693
#PETSc Option Table entries:
--n-threads=1
-i treat-cube_transient.i
-ksp_gmres_restart 100
-log_view
-pc_gamg_sym_graph true
-pc_hypre_boomeramg_max_iter 4
-pc_hypre_boomeramg_strong_threshold 0.7
-pc_hypre_boomeramg_tol 1.0e-6
-pc_hypre_type boomeramg
-pc_mg_levels 2
-pc_type gamg
-pc_use_amat false
-snes_mf_operator
-snes_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1
-----------------------------------------
Libraries compiled on Tue Feb 7 16:47:41 2017 on falcon1
Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /home/kongf/workhome/projects/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -fopenmp -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -fopenmp -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl
-----------------------------------------
More information about the petsc-users
mailing list