[petsc-users] GAMG for the unsymmetrical matrix

Fri Apr 7 16:29:47 CDT 2017

Thanks, Barry.

It works.

GAMG is three times better than ASM in terms of the number of linear
iterations, but it is five times slower than ASM. Any suggestions to
improve the performance of GAMG? Log files are attached.

Fande,

On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> > Thanks, Mark and Barry,
> >
> > It works pretty wells in terms of the number of linear iterations (using
> "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
> using the two-level method via "-pc_mg_levels 2". The reason why the
> compute time is larger than other preconditioning options is that a matrix
> free method is used in the fine level and in my particular problem the
> function evaluation is expensive.
> >
> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
> but I do not think I want to make the preconditioning part matrix-free.  Do
> you guys know how to turn off the matrix-free method for GAMG?
>
>    -pc_use_amat false
>
> >
> > Here is the detailed solver:
> >
> > SNES Object: 384 MPI processes
> >   type: newtonls
> >   maximum iterations=200, maximum function evaluations=10000
> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> >   total number of linear solver iterations=20
> >   total number of function evaluations=166
> >   norm schedule ALWAYS
> >   SNESLineSearch Object:   384 MPI processes
> >     type: bt
> >       interpolation: cubic
> >       alpha=1.000000e-04
> >     maxstep=1.000000e+08, minlambda=1.000000e-12
> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> >     maximum iterations=40
> >   KSP Object:   384 MPI processes
> >     type: gmres
> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >       GMRES: happy breakdown tolerance 1e-30
> >     maximum iterations=100, initial guess is zero
> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
> >     right preconditioning
> >     using UNPRECONDITIONED norm type for convergence test
> >   PC Object:   384 MPI processes
> >     type: gamg
> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
> >         Cycles per PCApply=1
> >         Using Galerkin computed coarse grid matrices
> >         GAMG specific options
> >           Threshold for dropping small values from graph 0.
> >           AGG specific options
> >             Symmetric graph true
> >     Coarse grid solver -- level -------------------------------
> >       KSP Object:      (mg_coarse_)       384 MPI processes
> >         type: preonly
> >         maximum iterations=10000, initial guess is zero
> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >         left preconditioning
> >         using NONE norm type for convergence test
> >       PC Object:      (mg_coarse_)       384 MPI processes
> >         type: bjacobi
> >           block Jacobi: number of blocks = 384
> >           Local solve is same for all blocks, in the following KSP and
> PC objects:
> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
> >           type: preonly
> >           maximum iterations=1, initial guess is zero
> >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >           left preconditioning
> >           using NONE norm type for convergence test
> >         PC Object:        (mg_coarse_sub_)         1 MPI processes
> >           type: lu
> >             LU: out-of-place factorization
> >             tolerance for zero pivot 2.22045e-14
> >             using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> >             matrix ordering: nd
> >             factor fill ratio given 5., needed 1.31367
> >               Factored matrix follows:
> >                 Mat Object:                 1 MPI processes
> >                   type: seqaij
> >                   rows=37, cols=37
> >                   package used to perform factorization: petsc
> >                   total: nonzeros=913, allocated nonzeros=913
> >                   total number of mallocs used during MatSetValues calls
> =0
> >                     not using I-node routines
> >           linear system matrix = precond matrix:
> >           Mat Object:           1 MPI processes
> >             type: seqaij
> >             rows=37, cols=37
> >             total: nonzeros=695, allocated nonzeros=695
> >             total number of mallocs used during MatSetValues calls =0
> >               not using I-node routines
> >         linear system matrix = precond matrix:
> >         Mat Object:         384 MPI processes
> >           type: mpiaij
> >           rows=18145, cols=18145
> >           total: nonzeros=1709115, allocated nonzeros=1709115
> >           total number of mallocs used during MatSetValues calls =0
> >             not using I-node (on process 0) routines
> >     Down solver (pre-smoother) on level 1 ------------------------------
> -
> >       KSP Object:      (mg_levels_1_)       384 MPI processes
> >         type: chebyshev
> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max = 1.46673
> >           Chebyshev: eigenvalues estimated using gmres with
> translations  [0. 0.1; 0. 1.1]
> >           KSP Object:          (mg_levels_1_esteig_)           384 MPI
> processes
> >             type: gmres
> >               GMRES: restart=30, using Classical (unmodified)
> Gram-Schmidt Orthogonalization with no iterative refinement
> >               GMRES: happy breakdown tolerance 1e-30
> >             maximum iterations=10, initial guess is zero
> >             tolerances:  relative=1e-12, absolute=1e-50,
> divergence=10000.
> >             left preconditioning
> >             using PRECONDITIONED norm type for convergence test
> >         maximum iterations=2
> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >         left preconditioning
> >         using nonzero initial guess
> >         using NONE norm type for convergence test
> >       PC Object:      (mg_levels_1_)       384 MPI processes
> >         type: sor
> >           SOR: type = local_symmetric, iterations = 1, local iterations
> = 1, omega = 1.
> >         linear system matrix followed by preconditioner matrix:
> >         Mat Object:         384 MPI processes
> >           type: mffd
> >           rows=3020875, cols=3020875
> >             Matrix-free approximation:
> >               err=1.49012e-08 (relative error in function evaluation)
> >               Using wp compute h routine
> >                   Does not compute normU
> >         Mat Object:        ()         384 MPI processes
> >           type: mpiaij
> >           rows=3020875, cols=3020875
> >           total: nonzeros=215671710, allocated nonzeros=241731750
> >           total number of mallocs used during MatSetValues calls =0
> >             not using I-node (on process 0) routines
> >     Up solver (post-smoother) same as down solver (pre-smoother)
> >     linear system matrix followed by preconditioner matrix:
> >     Mat Object:     384 MPI processes
> >       type: mffd
> >       rows=3020875, cols=3020875
> >         Matrix-free approximation:
> >           err=1.49012e-08 (relative error in function evaluation)
> >           Using wp compute h routine
> >               Does not compute normU
> >     Mat Object:    ()     384 MPI processes
> >       type: mpiaij
> >       rows=3020875, cols=3020875
> >       total: nonzeros=215671710, allocated nonzeros=241731750
> >       total number of mallocs used during MatSetValues calls =0
> >         not using I-node (on process 0) routines
> >
> >
> > Fande,
> >
> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >> Does this mean that GAMG works for the symmetrical matrix only?
> > >
> > >   No, it means that for non symmetric nonzero structure you need the
> extra flag. So use the extra flag. The reason we don't always use the flag
> is because it adds extra cost and isn't needed if the matrix already has a
> symmetric nonzero structure.
> >
> > BTW, if you have symmetric non-zero structure you can just set
> > -pc_gamg_threshold -1.0', note the "or" in the message.
> >
> > If you want to mess with the threshold then you need to use the
> > symmetrized flag.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170407/61aaa18d/attachment-0001.html>
-------------- next part --------------
Time Step 10, time = 0.1
                dt = 0.01
0  Nonlinear |R| = 2.004779e-03
0       Linear |R| = 2.004779e-03
1       Linear |R| = 1.080152e-03
2       Linear |R| = 5.066679e-04
3       Linear |R| = 3.045271e-04
4       Linear |R| = 1.925133e-04
5       Linear |R| = 1.404396e-04
6       Linear |R| = 1.087962e-04
7       Linear |R| = 9.433190e-05
8       Linear |R| = 8.650164e-05
9       Linear |R| = 7.511298e-05
10      Linear |R| = 6.116103e-05
11      Linear |R| = 5.097880e-05
12      Linear |R| = 4.528093e-05
13      Linear |R| = 4.238188e-05
14      Linear |R| = 3.852598e-05
15      Linear |R| = 3.211727e-05
16      Linear |R| = 2.655089e-05
17      Linear |R| = 2.308499e-05
18      Linear |R| = 1.988423e-05
19      Linear |R| = 1.686685e-05
20      Linear |R| = 1.453042e-05
21      Linear |R| = 1.227912e-05
22      Linear |R| = 9.829701e-06
23      Linear |R| = 7.695993e-06
24      Linear |R| = 6.092649e-06
25      Linear |R| = 5.293533e-06
26      Linear |R| = 4.583670e-06
27      Linear |R| = 3.427266e-06
28      Linear |R| = 2.442730e-06
29      Linear |R| = 1.855485e-06
1  Nonlinear |R| = 1.855485e-06
0       Linear |R| = 1.855485e-06
1       Linear |R| = 1.626392e-06
2       Linear |R| = 1.505583e-06
3       Linear |R| = 1.258325e-06
4       Linear |R| = 8.295100e-07
5       Linear |R| = 6.184171e-07
6       Linear |R| = 5.114149e-07
7       Linear |R| = 4.146942e-07
8       Linear |R| = 3.335395e-07
9       Linear |R| = 2.647491e-07
10      Linear |R| = 2.099801e-07
11      Linear |R| = 1.774148e-07
12      Linear |R| = 1.508766e-07
13      Linear |R| = 1.214361e-07
14      Linear |R| = 1.009707e-07
15      Linear |R| = 9.148193e-08
16      Linear |R| = 8.608036e-08
17      Linear |R| = 7.997930e-08
18      Linear |R| = 7.004223e-08
19      Linear |R| = 5.671891e-08
20      Linear |R| = 4.909039e-08
21      Linear |R| = 4.690188e-08
22      Linear |R| = 4.309895e-08
23      Linear |R| = 3.325854e-08
24      Linear |R| = 2.375529e-08
25      Linear |R| = 1.690025e-08
26      Linear |R| = 1.237871e-08
27      Linear |R| = 8.720643e-09
28      Linear |R| = 5.961891e-09
29      Linear |R| = 4.283073e-09
30      Linear |R| = 3.126338e-09
31      Linear |R| = 2.185008e-09
32      Linear |R| = 1.411854e-09
2  Nonlinear |R| = 1.411854e-09
SNES Object: 384 MPI processes
  type: newtonls
  maximum iterations=200, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
  total number of linear solver iterations=61
  total number of function evaluations=66
  norm schedule ALWAYS
  SNESLineSearch Object:   384 MPI processes
    type: bt
      interpolation: cubic
      alpha=1.000000e-04
    maxstep=1.000000e+08, minlambda=1.000000e-12
    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
    maximum iterations=40
  KSP Object:   384 MPI processes
    type: gmres
      GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      GMRES: happy breakdown tolerance 1e-30
    maximum iterations=100, initial guess is zero
    tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
    right preconditioning
    using UNPRECONDITIONED norm type for convergence test
  PC Object:   384 MPI processes
    type: asm
      Additive Schwarz: total subdomain blocks = 384, amount of overlap = 1
      Additive Schwarz: restriction/interpolation type - RESTRICT
      Local solve is same for all blocks, in the following KSP and PC objects:
    KSP Object:    (sub_)     1 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (sub_)     1 MPI processes
      type: ilu
        ILU: out-of-place factorization
        0 levels of fill
        tolerance for zero pivot 2.22045e-14
        matrix ordering: natural
        factor fill ratio given 1., needed 1.
          Factored matrix follows:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=20493, cols=20493
              package used to perform factorization: petsc
              total: nonzeros=1270950, allocated nonzeros=1270950
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:      ()       1 MPI processes
        type: seqaij
        rows=20493, cols=20493
        total: nonzeros=1270950, allocated nonzeros=1270950
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
    linear system matrix followed by preconditioner matrix:
    Mat Object:     384 MPI processes
      type: mffd
      rows=3020875, cols=3020875
        Matrix-free approximation:
          err=1.49012e-08 (relative error in function evaluation)
          Using wp compute h routine
              Does not compute normU
    Mat Object:    ()     384 MPI processes
      type: mpiaij
      rows=3020875, cols=3020875
      total: nonzeros=215671710, allocated nonzeros=241731750
      total number of mallocs used during MatSetValues calls =0
        not using I-node (on process 0) routines
 Solve Converged!

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i0n1 with 384 processors, by kongf Tue Mar 14 16:28:04 2017
Using Petsc Release Version 3.7.5, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.387e+02      1.00001   4.387e+02
Objects:              1.279e+03      1.00000   1.279e+03
Flops:                4.230e+09      1.99161   2.946e+09  1.131e+12
Flops/sec:            9.642e+06      1.99162   6.716e+06  2.579e+09
MPI Messages:         2.935e+05      4.95428   1.810e+05  6.951e+07
MPI Message Lengths:  3.105e+09      3.16103   1.072e+04  7.449e+11
MPI Reductions:       5.022e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.3875e+02 100.0%  1.1314e+12 100.0%  6.951e+07 100.0%  1.072e+04      100.0%  5.022e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                20 1.0 3.2134e-03 2.4 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  0   0  0  0  0  0 37601
VecMDot              839 1.0 6.7209e-01 1.2 3.52e+08 2.3 0.0e+00 0.0e+00 8.4e+02  0  8  0  0  2   0  8  0  0  2 139634
VecNorm             1802 1.0 6.7932e+00 2.5 4.08e+07 2.3 0.0e+00 0.0e+00 1.8e+03  1  1  0  0  4   1  1  0  0  4  1603
VecScale            3877 1.0 1.0508e-01 1.4 1.34e+08 1.3 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 439546
VecCopy             4153 1.0 7.2803e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              5493 1.0 5.1735e-01 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             5365 1.0 4.0282e-01 2.3 3.01e+08 1.4 0.0e+00 0.0e+00 0.0e+00  0  9  0  0  0   0  9  0  0  0 251646
VecWAXPY             884 1.0 5.5227e-02 3.5 1.97e+07 2.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 95341
VecMAXPY             864 1.0 1.7126e-01 2.6 3.71e+08 2.3 0.0e+00 0.0e+00 0.0e+00  0  9  0  0  0   0  9  0  0  0 577621
VecAssemblyBegin   15491 1.0 1.3738e+02 3.0 0.00e+00 0.0 8.9e+06 1.8e+04 4.6e+04 28  0 13 22 93  28  0 13 22 93     0
VecAssemblyEnd     15491 1.0 7.9072e-0128.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    13390 1.0 2.5097e+00 3.6 0.00e+00 0.0 5.9e+07 8.4e+03 2.8e+01  0  0 85 67  0   0  0 85 67  0     0
VecScatterEnd      13362 1.0 5.7428e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecReduceArith        55 1.0 1.2808e-03 2.2 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 259431
VecReduceComm         25 1.0 5.5003e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         864 1.0 4.4664e+00 3.5 2.93e+07 2.3 0.0e+00 0.0e+00 8.6e+02  1  1  0  0  2   1  1  0  0  2  1753
MatMult MF           859 1.0 3.1339e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83  71 12 81 73 83   439
MatMult              859 1.0 3.1340e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83  71 12 81 73 83   439
MatSolve             864 1.0 2.1255e+00 2.0 1.83e+09 2.1 0.0e+00 0.0e+00 0.0e+00  0 43  0  0  0   0 43  0  0  0 226791
MatLUFactorNum        25 1.0 1.0920e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 267745
MatILUFactorSym       13 1.0 1.0606e-01 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin     150 1.0 2.0643e+00 1.2 0.00e+00 0.0 1.7e+05 1.7e+05 2.0e+02  0  0  0  4  0   0  0  0  4  0     0
MatAssemblyEnd       150 1.0 4.3198e+00 1.1 0.00e+00 0.0 1.9e+04 1.1e+03 2.1e+02  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ           13 1.0 1.3113e-0513.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice      25 1.0 4.4022e+00 2.8 0.00e+00 0.0 5.9e+05 8.4e+04 7.5e+01  1  0  1  7  0   1  0  1  7  0     0
MatGetOrdering        13 1.0 1.7283e-0217.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp      13 1.0 2.0244e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        29 1.0 5.0908e-02 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               52 2.0 5.5351e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01  0  0  0  0  0   0  0  0  0  0     0
SNESSolve             13 1.0 3.7214e+02 1.0 4.21e+09 2.0 6.6e+07 1.0e+04 4.8e+04 85100 95 92 95  85100 95 92 95  3026
SNESFunctionEval     897 1.0 3.2606e+02 1.0 3.62e+08 1.3 5.9e+07 9.6e+03 4.3e+04 74 11 85 76 85  74 11 85 76 85   384
SNESJacobianEval      25 1.0 3.4770e+01 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03  8  1  3  7  4   8  1  3  7  4   195
SNESLineSearch        25 1.0 1.8090e+01 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03  4  1  4  4  5   4  1  4  4  5   475
BuildTwoSided         25 1.0 4.6378e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph            25 1.0 2.7061e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin         25 1.0 4.6412e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd           25 1.0 8.1301e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       839 1.0 8.0119e-01 1.2 7.03e+08 2.3 0.0e+00 0.0e+00 8.4e+02  0 17  0  0  2   0 17  0  0  2 234277
KSPSetUp              50 1.0 3.0220e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              25 1.0 3.1444e+02 1.0 4.16e+09 2.0 6.0e+07 9.9e+03 4.3e+04 72 98 86 80 85  72 98 86 80 85  3526
PCSetUp               50 1.0 5.4896e+00 2.4 1.20e+09 2.5 7.1e+05 7.0e+04 1.8e+02  1 26  1  7  0   1 26  1  7  0 53260
PCSetUpOnBlocks       25 1.0 1.1928e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 245124
PCApply              864 1.0 2.4803e+00 2.0 1.83e+09 2.1 4.1e+06 4.4e+03 0.0e+00  0 43  6  2  0   0 43  6  2  0 194354
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   740            740    732012968     0.
      Vector Scatter    76             76      1212680     0.
           Index Set   176            176      4673716     0.
   IS L to G Mapping    33             33      3228828     0.
             MatMFFD    13             13        10088     0.
              Matrix    45             45    364469360     0.
                SNES    13             13        17316     0.
      SNESLineSearch    13             13        12896     0.
              DMSNES    13             13         8632     0.
    Distributed Mesh    13             13        60320     0.
Star Forest Bipartite Graph    51             51        43248     0.
     Discrete System    13             13        11232     0.
       Krylov Solver    26             26      2223520     0.
     DMKSP interface    13             13         8424     0.
      Preconditioner    26             26        25688     0.
              Viewer    15             13        10816     0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 2.08554e-06
#PETSc Option Table entries:
--n-threads=1
-i treat-cube_transient.i
-ksp_gmres_restart 100
-log_view
-pc_hypre_boomeramg_max_iter 4
-pc_hypre_boomeramg_strong_threshold 0.7
-pc_hypre_boomeramg_tol 1.0e-6
-pc_hypre_type boomeramg
-pc_type asm
-snes_mf_operator
-snes_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1
-----------------------------------------
Libraries compiled on Tue Feb  7 16:47:41 2017 on falcon1 
Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /home/kongf/workhome/projects/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc -fPIC -fopenmp   -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -fopenmp  -g -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl 
-----------------------------------------

-------------- next part --------------
Time Step 10, time = 0.1
                dt = 0.01
0  Nonlinear |R| = 2.004778e-03
0       Linear |R| = 2.004778e-03
1       Linear |R| = 4.440581e-04
2       Linear |R| = 1.283930e-04
3       Linear |R| = 9.874954e-05
4       Linear |R| = 6.589984e-05
5       Linear |R| = 4.483411e-05
6       Linear |R| = 2.787575e-05
7       Linear |R| = 1.435839e-05
8       Linear |R| = 8.720579e-06
9       Linear |R| = 3.704796e-06
10      Linear |R| = 2.317054e-06
11      Linear |R| = 9.060942e-07
1  Nonlinear |R| = 9.060942e-07
0       Linear |R| = 9.060942e-07
1       Linear |R| = 6.874101e-07
2       Linear |R| = 3.052995e-07
3       Linear |R| = 1.728171e-07
4       Linear |R| = 7.805237e-08
5       Linear |R| = 5.011253e-08
6       Linear |R| = 2.903814e-08
7       Linear |R| = 2.421108e-08
8       Linear |R| = 1.594860e-08
9       Linear |R| = 1.116189e-08
10      Linear |R| = 4.372907e-09
11      Linear |R| = 1.575997e-09
12      Linear |R| = 5.765413e-10
2  Nonlinear |R| = 5.765413e-10
SNES Object: 384 MPI processes
  type: newtonls
  maximum iterations=200, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
  total number of linear solver iterations=23
  total number of function evaluations=28
  norm schedule ALWAYS
  SNESLineSearch Object:   384 MPI processes
    type: bt
      interpolation: cubic
      alpha=1.000000e-04
    maxstep=1.000000e+08, minlambda=1.000000e-12
    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
    maximum iterations=40
  KSP Object:   384 MPI processes
    type: gmres
      GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      GMRES: happy breakdown tolerance 1e-30
    maximum iterations=100, initial guess is zero
    tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
    right preconditioning
    using UNPRECONDITIONED norm type for convergence test
  PC Object:   384 MPI processes
    type: gamg
      MG: type is MULTIPLICATIVE, levels=2 cycles=v
        Cycles per PCApply=1
        Using Galerkin computed coarse grid matrices
        GAMG specific options
          Threshold for dropping small values from graph 0.
          AGG specific options
            Symmetric graph true
    Coarse grid solver -- level -------------------------------
      KSP Object:      (mg_coarse_)       384 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_)       384 MPI processes
        type: bjacobi
          block Jacobi: number of blocks = 384
          Local solve is same for all blocks, in the following KSP and PC objects:
        KSP Object:        (mg_coarse_sub_)         1 MPI processes
          type: preonly
          maximum iterations=1, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_sub_)         1 MPI processes
          type: lu
            LU: out-of-place factorization
            tolerance for zero pivot 2.22045e-14
            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
            matrix ordering: nd
            factor fill ratio given 5., needed 1.31367
              Factored matrix follows:
                Mat Object:                 1 MPI processes
                  type: seqaij
                  rows=37, cols=37
                  package used to perform factorization: petsc
                  total: nonzeros=913, allocated nonzeros=913
                  total number of mallocs used during MatSetValues calls =0
                    not using I-node routines
          linear system matrix = precond matrix:
          Mat Object:           1 MPI processes
            type: seqaij
            rows=37, cols=37
            total: nonzeros=695, allocated nonzeros=695
            total number of mallocs used during MatSetValues calls =0
              not using I-node routines
        linear system matrix = precond matrix:
        Mat Object:         384 MPI processes
          type: mpiaij
          rows=18145, cols=18145
          total: nonzeros=1709115, allocated nonzeros=1709115
          total number of mallocs used during MatSetValues calls =0
            not using I-node (on process 0) routines
    Down solver (pre-smoother) on level 1 -------------------------------
      KSP Object:      (mg_levels_1_)       384 MPI processes
        type: chebyshev
          Chebyshev: eigenvalue estimates:  min = 0.138116, max = 1.51927
          Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
          KSP Object:          (mg_levels_1_esteig_)           384 MPI processes
            type: gmres
              GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
              GMRES: happy breakdown tolerance 1e-30
            maximum iterations=10, initial guess is zero
            tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
            left preconditioning
            using PRECONDITIONED norm type for convergence test
        maximum iterations=2
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using nonzero initial guess
        using NONE norm type for convergence test
      PC Object:      (mg_levels_1_)       384 MPI processes
        type: sor
          SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
        Mat Object:        ()         384 MPI processes
          type: mpiaij
          rows=3020875, cols=3020875
          total: nonzeros=215671710, allocated nonzeros=241731750
          total number of mallocs used during MatSetValues calls =0
            not using I-node (on process 0) routines
    Up solver (post-smoother) same as down solver (pre-smoother)
    linear system matrix followed by preconditioner matrix:
    Mat Object:     384 MPI processes
      type: mffd
      rows=3020875, cols=3020875
        Matrix-free approximation:
          err=1.49012e-08 (relative error in function evaluation)
          Using wp compute h routine
              Does not compute normU
    Mat Object:    ()     384 MPI processes
      type: mpiaij
      rows=3020875, cols=3020875
      total: nonzeros=215671710, allocated nonzeros=241731750
      total number of mallocs used during MatSetValues calls =0
        not using I-node (on process 0) routines
 Solve Converged!

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i4n2 with 384 processors, by kongf Fri Apr  7 13:36:35 2017
Using Petsc Release Version 3.7.5, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.266e+03      1.00001   2.266e+03
Objects:              6.020e+03      1.00000   6.020e+03
Flops:                1.064e+10      2.27050   7.337e+09  2.817e+12
Flops/sec:            4.695e+06      2.27050   3.237e+06  1.243e+09
MPI Messages:         3.459e+05      5.11666   2.112e+05  8.111e+07
MPI Message Lengths:  3.248e+09      3.35280   9.453e+03  7.667e+11
MPI Reductions:       4.610e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.2663e+03 100.0%  2.8172e+12 100.0%  8.111e+07 100.0%  9.453e+03      100.0%  4.610e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                20 1.0 6.1171e-01 1.6 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  0   0  0  0  0  0   198
VecMDot             1091 1.0 3.4823e+01 1.7 1.05e+08 2.3 0.0e+00 0.0e+00 1.1e+03  1  1  0  0  2   1  1  0  0  2   803
VecNorm             1943 1.0 6.9656e+01 1.6 3.66e+07 2.3 0.0e+00 0.0e+00 1.9e+03  3  0  0  0  4   3  0  0  0  4   140
VecScale            2928 1.0 1.1091e-01 2.8 7.24e+07 1.4 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 219463
VecCopy             3086 1.0 6.0201e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              7168 1.0 4.2314e-01 7.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             3263 1.0 3.7908e-01 4.1 1.59e+08 1.4 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 138504
VecAYPX             4112 1.0 1.1982e-01 4.2 3.59e+07 2.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 80071
VecAXPBYCZ          2056 1.0 7.5538e-02 3.3 7.18e+07 2.3 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 254030
VecWAXPY             743 1.0 7.8864e-02 4.9 1.65e+07 2.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 55963
VecMAXPY            1196 1.0 7.9660e-02 3.3 1.23e+08 2.3 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 411137
VecAssemblyBegin   12333 1.0 1.1090e+03 1.2 0.00e+00 0.0 7.6e+06 1.9e+04 3.7e+04 48  0  9 19 80  48  0  9 19 80     0
VecAssemblyEnd     12333 1.0 4.2957e-0124.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult     440 1.0 2.2301e-02 5.7 3.12e+06 2.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 37433
VecScatterBegin    13638 1.0 2.3693e+00 4.9 0.00e+00 0.0 6.4e+07 5.6e+03 2.8e+01  0  0 79 46  0   0  0 79 46  0     0
VecScatterEnd      13610 1.0 2.1648e+0213.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecSetRandom          40 1.0 4.5372e-02 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith        55 1.0 1.3552e-03 2.7 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 245191
VecReduceComm         25 1.0 2.3911e+00 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01  0  0  0  0  0   0  0  0  0  0     0
VecNormalize        1196 1.0 2.8596e+01 1.1 2.95e+07 2.3 0.0e+00 0.0e+00 1.2e+03  1  0  0  0  3   1  0  0  0  3   275
MatMult MF           718 1.0 1.4078e+03 1.0 2.00e+08 1.4 4.2e+07 8.2e+03 3.2e+04 62  2 52 45 69  62  2 52 45 69    46
MatMult             4195 1.0 1.4272e+03 1.0 3.33e+09 2.2 5.8e+07 6.6e+03 3.2e+04 63 32 72 50 69  63 32 72 50 69   627
MatMultAdd           514 1.0 9.7981e+0016.1 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00  0  0  2  0  0   0  0  2  0  0   995
MatMultTranspose     514 1.0 6.0183e+0019.9 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00  0  0  2  0  0   0  0  2  0  0  1620
MatSolve             316 1.3 1.7905e-0219.7 1.76e+06 4.6 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 18236
MatSOR              3524 1.0 6.6987e+00 3.9 2.50e+09 2.6 0.0e+00 0.0e+00 0.0e+00  0 23  0  0  0   0 23  0  0  0 97291
MatLUFactorSym        25 1.0 1.7944e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum        25 1.0 2.2082e-03 6.0 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 136111
MatConvert            40 1.0 2.6915e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale             120 1.0 1.0204e+0022.5 3.86e+07 2.3 1.9e+05 2.9e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0 10018
MatResidual          514 1.0 5.3226e+01 1.1 4.35e+08 2.3 3.7e+06 4.2e+03 1.1e+03  2  4  5  2  2   2  4  5  2  2  2165
MatAssemblyBegin    1010 1.0 6.0257e+01 2.2 0.00e+00 0.0 1.7e+06 3.5e+04 8.4e+02  2  0  2  8  2   2  0  2  8  2     0
MatAssemblyEnd      1010 1.0 7.7316e+01 1.0 0.00e+00 0.0 2.5e+06 4.6e+02 2.1e+03  3  0  3  0  5   3  0  3  0  5     0
MatGetRow        1078194 2.3 2.4485e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ           25 1.2 3.7956e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrix       30 1.0 1.6949e+01 1.0 0.00e+00 0.0 1.2e+05 2.8e+02 5.1e+02  1  0  0  0  1   1  0  0  0  1     0
MatGetOrdering        25 1.2 1.8878e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen            40 1.0 1.5944e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02  1  0  3  1  1   1  0  3  1  1     0
MatZeroEntries        69 1.0 7.3145e-02 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               90 1.4 1.1229e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+01  0  0  0  0  0   0  0  0  0  0     0
MatAXPY               40 1.0 3.4301e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01  0  0  0  0  0   0  0  0  0  0     0
MatTranspose          20 1.0 1.2561e+01 1.1 0.00e+00 0.0 7.1e+05 2.0e+04 2.4e+02  1  0  1  2  1   1  0  1  2  1     0
MatMatMult            40 1.0 2.6365e+01 1.0 3.56e+07 2.3 1.2e+06 1.4e+03 6.4e+02  1  0  1  0  1   1  0  1  0  1   358
MatMatMultSym         40 1.0 2.3430e+01 1.0 0.00e+00 0.0 9.8e+05 1.1e+03 5.6e+02  1  0  1  0  1   1  0  1  0  1     0
MatMatMultNum         40 1.0 2.9809e+00 1.1 3.56e+07 2.3 1.9e+05 2.9e+03 8.0e+01  0  0  0  0  0   0  0  0  0  0  3170
MatPtAP               40 1.0 3.1763e+01 1.0 2.59e+08 2.3 2.7e+06 2.6e+03 6.8e+02  1  2  3  1  1   1  2  3  1  1  2012
MatPtAPSymbolic       40 1.0 1.7240e+01 1.1 0.00e+00 0.0 1.2e+06 4.6e+03 2.8e+02  1  0  1  1  1   1  0  1  1  1     0
MatPtAPNumeric        40 1.0 1.5004e+01 1.1 2.59e+08 2.3 1.5e+06 1.0e+03 4.0e+02  1  2  2  0  1   1  2  2  0  1  4259
MatTrnMatMult         25 1.0 1.1522e+02 1.0 4.05e+09 2.3 7.5e+05 2.6e+05 4.8e+02  5 37  1 25  1   5 37  1 25  1  9105
MatTrnMatMultSym      25 1.0 7.3735e+01 1.0 0.00e+00 0.0 6.3e+05 1.0e+05 4.2e+02  3  0  1  8  1   3  0  1  8  1     0
MatTrnMatMultNum      25 1.0 4.1508e+01 1.0 4.05e+09 2.3 1.2e+05 1.1e+06 5.0e+01  2 37  0 17  0   2 37  0 17  0 25275
MatGetLocalMat       170 1.0 6.0506e-01 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol        120 1.0 3.7906e+00 5.3 0.00e+00 0.0 1.3e+06 5.0e+03 0.0e+00  0  0  2  1  0   0  0  2  1  0     0
SNESSolve             13 1.0 1.9975e+03 1.0 1.06e+10 2.3 7.8e+07 9.1e+03 4.3e+04 88100 96 92 94  88100 96 92 94  1408
SNESFunctionEval     756 1.0 1.4539e+03 1.0 1.62e+08 1.4 4.4e+07 8.3e+03 3.3e+04 64  2 55 48 71  64  2 55 48 71    38
SNESJacobianEval      25 1.0 1.0415e+02 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03  5  0  3  7  4   5  0  3  7  4    65
SNESLineSearch        25 1.0 1.0113e+02 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03  4  0  4  4  5   4  0  4  4  5    85
BuildTwoSided         85 1.0 5.0838e+00 1.5 0.00e+00 0.0 1.5e+05 4.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph            85 1.0 3.2002e-02 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFBcastBegin         382 1.0 3.1338e+00 1.4 0.00e+00 0.0 2.6e+06 2.3e+03 0.0e+00  0  0  3  1  0   0  0  3  1  0     0
SFBcastEnd           382 1.0 5.2611e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin         45 1.0 2.5858e+00 1.5 0.00e+00 0.0 2.4e+05 1.8e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd           45 1.0 3.6487e-01253.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog      1091 1.0 3.4858e+01 1.7 2.09e+08 2.3 0.0e+00 0.0e+00 1.1e+03  1  2  0  0  2   1  2  0  0  2  1604
KSPSetUp             195 1.0 2.9202e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              25 1.0 1.7661e+03 1.0 1.06e+10 2.3 7.2e+07 8.6e+03 3.9e+04 78 99 88 80 84  78 99 88 80 84  1582
PCGAMGGraph_AGG       40 1.0 3.5930e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02  2  0  2  4  2   2  0  2  4  2   263
PCGAMGCoarse_AGG      40 1.0 1.4450e+02 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03  6 37  5 27  3   6 37  5 27  3  7260
PCGAMGProl_AGG        40 1.0 3.2209e+01 1.0 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02  1  0  1  0  2   1  0  1  0  2     0
PCGAMGPOpt_AGG        40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03  3  4  4  1  4   3  4  4  1  4  1987
GAMG: createProl      40 1.0 2.7631e+02 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 12 42 12 32 10  12 42 12 32 10  4286
  Graph               80 1.0 3.5926e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02  2  0  2  4  2   2  0  2  4  2   263
  MIS/Agg             40 1.0 1.5945e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02  1  0  3  1  1   1  0  3  1  1     0
  SA: col data        40 1.0 1.3401e+01 1.1 0.00e+00 0.0 4.2e+05 6.1e+03 4.0e+02  1  0  1  0  1   1  0  1  0  1     0
  SA: frmProl0        40 1.0 1.4033e+01 1.1 0.00e+00 0.0 5.6e+05 4.6e+02 4.0e+02  1  0  1  0  1   1  0  1  0  1     0
  SA: smooth          40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03  3  4  4  1  4   3  4  4  1  4  1987
GAMG: partLevel       40 1.0 5.8738e+01 1.0 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03  3  2  4  1  3   3  2  4  1  3  1088
  repartition         35 1.0 3.3741e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+01  0  0  0  0  0   0  0  0  0  0     0
  Invert-Sort         15 1.0 2.7445e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01  0  0  0  0  0   0  0  0  0  0     0
  Move A              15 1.0 9.3221e+00 1.0 0.00e+00 0.0 6.6e+04 4.9e+02 2.7e+02  0  0  0  0  1   0  0  0  0  1     0
  Move P              15 1.0 8.7196e+00 1.0 0.00e+00 0.0 5.7e+04 3.6e+01 2.7e+02  0  0  0  0  1   0  0  0  0  1     0
PCSetUp               50 1.0 3.4248e+02 1.0 4.81e+09 2.3 1.2e+07 2.0e+04 6.5e+03 15 44 15 33 14  15 44 15 33 14  3645
PCSetUpOnBlocks      316 1.0 2.1314e-02 6.3 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 14102
PCApply              316 1.0 7.8870e+02 1.0 5.52e+09 2.4 4.0e+07 4.4e+03 1.7e+04 34 52 49 23 37  34 52 49 23 37  1863
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector  2951           2951    828338752     0.
      Vector Scatter   353            353       367264     0.
           Index Set   833            833      6198336     0.
   IS L to G Mapping    33             33      3228828     0.
             MatMFFD    13             13        10088     0.
              Matrix  1334           1334   3083683516     0.
      Matrix Coarsen    40             40        25120     0.
                SNES    13             13        17316     0.
      SNESLineSearch    13             13        12896     0.
              DMSNES    13             13         8632     0.
    Distributed Mesh    13             13        60320     0.
Star Forest Bipartite Graph   111            111        94128     0.
     Discrete System    13             13        11232     0.
       Krylov Solver   123            123      4660776     0.
     DMKSP interface    13             13         8424     0.
      Preconditioner   123            123       117692     0.
         PetscRandom    13             13         8294     0.
              Viewer    15             13        10816     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.0217308
Average time for zero size MPI_Send(): 0.000133693
#PETSc Option Table entries:
--n-threads=1
-i treat-cube_transient.i
-ksp_gmres_restart 100
-log_view
-pc_gamg_sym_graph true
-pc_hypre_boomeramg_max_iter 4
-pc_hypre_boomeramg_strong_threshold 0.7
-pc_hypre_boomeramg_tol 1.0e-6
-pc_hypre_type boomeramg
-pc_mg_levels 2
-pc_type gamg
-pc_use_amat false
-snes_mf_operator
-snes_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1
-----------------------------------------
Libraries compiled on Tue Feb  7 16:47:41 2017 on falcon1 
Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /home/kongf/workhome/projects/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc -fPIC -fopenmp   -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -fopenmp  -g -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl 
-----------------------------------------