[petsc-users] KSP changes for successive solver
Michele Rosso
mrosso at uci.edu
Fri Jul 24 14:44:30 CDT 2015
Barry,
I attached ksp_view and log_summary for two different setups:
1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
2) Plain MG on 5 levels + custom PC + LU at the coarse level (files
ending in mg7)
The custom PC works on a subset of processes, thus allowing to use two
more levels of MG, for a total of 7.
Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21
iterations.
Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29
iterations.
Thanks for your help!
Michele
On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
> The coarse problem for the PCMG (geometric multigrid) is
>
> Mat Object: 8192 MPI processes
> type: mpiaij
> rows=8192, cols=8192
>
> then it tries to solve it with algebraic multigrid on 8192 processes (which is completely insane). A lot of the time is spent in setting up the algebraic multigrid (not surprisingly).
>
> 8192 is kind of small to parallelize. Please run the same code but with the default coarse grid problem instead of PCGAMG and send us the -log_summary again
>
> Barry
>
> > On Jul 24, 2015, at 1:35 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >
> > Hi Mark and Barry,
> >
> > I am sorry for my late reply: it was a busy week!
> > I run a test case for a larger problem with as many levels (i.e. 5) of MG I could and GAMG as PC at the coarse level. I attached the output of info ( after grep for "gmag"), ksp_view and log_summary.
> > The solve takes about 2 seconds on 8192 cores, which is way too much. The number of iterations to convergence is 24.
> > I hope there is a way to speed it up.
> >
> > Thanks,
> > Michele
> >
> >
> > On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >>
> >>
> >> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >> Barry,
> >>
> >> thank you very much for the detailed answer. I tried what you suggested and it works.
> >> So far I tried on a small system but the final goal is to use it for very large runs. How does PCGAMG compares to PCMG as far as performances and scalability are concerned?
> >> Also, could you help me to tune the GAMG part ( my current setup is in the attached ksp_view.txt file )?
> >>
> >>
> >>
> >> I am going to add this to the document today but you can run with -info. This is very noisy so you might want to do the next step at run time. Then grep on GAMG. This will be about 20 lines. Send that to us and we can go from there.
> >>
> >>
> >> Mark
> >>
> >>
> >>
> >>
> >> I also tried to use superlu_dist for the LU decomposition on mg_coarse_mg_sub_
> >> -mg_coarse_mg_coarse_sub_pc_type lu
> >> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >>
> >> but I got an error:
> >>
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> ****** Error in MC64A/AD. INFO(1) = -2
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >> symbfact() error returns 0
> >>
> >>
> >> Thank you,
> >> Michele
> >>
> >>
> >> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>>
> >>> > On Jul 16, 2015, at 5:42 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >>> >
> >>> > Barry,
> >>> >
> >>> > thanks for your reply. So if I want it fixed, I will have to use the master branch, correct?
> >>>
> >>>
> >>> Yes, or edit mg.c and remove the offending lines of code (easy enough).
> >>>
> >>> >
> >>> > On a side note, what I am trying to achieve is to be able to use how many levels of MG I want, despite the limitation imposed by the local number of grid nodes.
> >>>
> >>>
> >>> I assume you are talking about with DMDA? There is no generic limitation for PETSc's multigrid, it is only with the way the DMDA code figures out the interpolation that causes a restriction.
> >>>
> >>>
> >>> > So far I am using a borrowed code that implements a PC that creates a sub communicator and perform MG on it.
> >>> > While reading the documentation I found out that PCMGSetLevels takes in an optional array of communicators. How does this work?
> >>>
> >>>
> >>> It doesn't work. It was an idea that never got pursued.
> >>>
> >>>
> >>> > Can I can simply define my matrix and rhs on the fine grid as I would do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP would take care of it by using the correct communicator for each level?
> >>>
> >>>
> >>> No.
> >>>
> >>> You can use the PCMG geometric multigrid with DMDA for as many levels as it works and then use PCGAMG as the coarse grid solver. PCGAMG automatically uses fewer processes for the coarse level matrices and vectors. You could do this all from the command line without writing code.
> >>>
> >>> For example if your code uses a DMDA and calls KSPSetDM() use for example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg -ksp_view
> >>>
> >>>
> >>>
> >>> Barry
> >>>
> >>>
> >>>
> >>> >
> >>> > Thanks,
> >>> > Michele
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>> >> Michel,
> >>> >>
> >>> >> This is a very annoying feature that has been fixed in master
> >>> >> http://www.mcs.anl.gov/petsc/developers/index.html
> >>> >> I would like to have changed it in maint but Jed would have a shit-fit :-) since it changes behavior.
> >>> >>
> >>> >> Barry
> >>> >>
> >>> >>
> >>> >> > On Jul 16, 2015, at 4:53 PM, Michele Rosso <mrosso at uci.edu> wrote:
> >>> >> >
> >>> >> > Hi,
> >>> >> >
> >>> >> > I am performing a series of solves inside a loop. The matrix for each solve changes but not enough to justify a rebuilt of the PC at each solve.
> >>> >> > Therefore I am using KSPSetReusePreconditioner to avoid rebuilding unless necessary. The solver is CG + MG with a custom PC at the coarse level.
> >>> >> > If KSP is not updated each time, everything works as it is supposed to.
> >>> >> > When instead I allow the default PETSc behavior, i.e. updating PC every time the matrix changes, the coarse level KSP , initially set to PREONLY, is changed into GMRES
> >>> >> > after the first solve. I am not sure where the problem lies (my PC or PETSc), so I would like to have your opinion on this.
> >>> >> > I attached the ksp_view for the 2 successive solve and the options stack.
> >>> >> >
> >>> >> > Thanks for your help,
> >>> >> > Michel
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > <ksp_view.txt><petsc_options.txt>
> >>> >>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > <info.txt><ksp_view.txt><log_gamg.txt>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/36b2bccf/attachment-0001.html>
-------------- next part --------------
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0, needed 0
Factored matrix follows:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 128 x npcol 64
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots TRUE
Use iterative refinement FALSE
Processors in row 128 col partition 64
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern_SameRowPerm
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
KSP Object: 8192 MPI processes
type: cg
maximum iterations=10000
tolerances: relative=1e-09, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8192 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8192 MPI processes
type: dmdarepart
DMDARepart: parent comm size reduction factor = 64
DMDARepart: subcomm_size = 128
KSP Object: (mg_coarse_dmdarepart_) 128 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_) 128 MPI processes
type: mg
MG: type is MULTIPLICATIVE, levels=2 cycles=v
Cycles per PCApply=1
Using Galerkin computed coarse grid matrices
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_dmdarepart_mg_coarse_) 128 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_coarse_) 128 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0, needed 0
Factored matrix follows:
Mat Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 16 x npcol 8
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots TRUE
Use iterative refinement FALSE
Processors in row 16 col partition 8
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern_SameRowPerm
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=1024, cols=1024
total: nonzeros=6528, allocated nonzeros=6528
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_coarse_dmdarepart_mg_levels_1_) 128 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_coarse_dmdarepart_mg_levels_1_) 128 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 128 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=8192, cols=8192
total: nonzeros=54784, allocated nonzeros=54784
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=65536, cols=65536
total: nonzeros=448512, allocated nonzeros=448512
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=524288, cols=524288
total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=4194304, cols=4194304
total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8192 MPI processes
type: richardson
Richardson: damping factor=1
maximum iterations=2
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using nonzero initial guess
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8192 MPI processes
type: sor
SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 8192 MPI processes
type: mpiaij
rows=33554432, cols=33554432
total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
total number of mallocs used during MatSetValues calls =0
has attached null space
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx named p
þÿÿ with 8192 processors, by mrosso Fri Jul 24 14:11:55 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17 10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 7.565e+01 1.00002 7.565e+01
Objects: 7.230e+02 1.00000 7.230e+02
Flops: 5.717e+07 1.01632 5.707e+07 4.675e+11
Flops/sec: 7.557e+05 1.01634 7.544e+05 6.180e+09
MPI Messages: 9.084e+03 2.00000 8.611e+03 7.054e+07
MPI Message Lengths: 6.835e+06 2.00000 7.524e+02 5.307e+10
MPI Reductions: 1.000e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 7.5651e+01 100.0% 4.6755e+11 100.0% 7.054e+07 100.0% 7.524e+02 100.0% 9.990e+02 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 174 1.0 1.8118e-01 1.9 1.43e+06 1.0 0.0e+00 0.0e+00 1.7e+02 0 2 0 0 17 0 2 0 0 17 64440
VecNorm 94 1.0 6.4223e-02 2.1 7.70e+05 1.0 0.0e+00 0.0e+00 9.4e+01 0 1 0 0 9 0 1 0 0 9 98224
VecScale 787 1.0 1.0910e-03 1.6 1.48e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1059301
VecCopy 179 1.0 1.0858e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1240 1.0 1.4889e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 522 1.0 5.7485e-03 1.2 4.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 6093896
VecAYPX 695 1.0 5.3260e-03 1.4 2.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 3335289
VecAssemblyBegin 4 1.0 1.3018e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.6499e-0428.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2182 1.0 2.2002e-02 2.1 0.00e+00 0.0 6.9e+07 7.6e+02 0.0e+00 0 0 98 99 0 0 0 98 99 0 0
VecScatterEnd 2182 1.0 5.0710e+0074.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
MatMult 699 1.0 2.3855e+0031.0 2.40e+07 1.0 3.3e+07 1.4e+03 0.0e+00 0 42 46 84 0 0 42 46 84 0 82105
MatMultAdd 348 1.0 5.8677e-03 1.6 8.14e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1136883
MatMultTranspose 352 1.0 5.7197e-03 1.2 8.24e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1179718
MatSolve 87 1.0 5.8730e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 72 0 0 0 0 72 0 0 0 0 0
MatSOR 870 1.0 5.0801e+0055.5 2.27e+07 1.0 3.6e+07 2.2e+02 0.0e+00 4 40 52 15 0 4 40 52 15 0 36617
MatLUFactorSym 1 1.0 9.5398e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.4040e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 19 0 0 0 0 19 0 0 0 0 0
MatResidual 348 1.0 4.1076e-02 1.8 5.70e+06 1.0 1.6e+07 6.8e+02 0.0e+00 0 10 23 21 0 0 10 23 21 0 1133130
MatAssemblyBegin 21 1.0 2.5973e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.6e+01 0 0 0 0 3 0 0 0 0 3 0
MatAssemblyEnd 21 1.0 5.4194e-02 2.0 0.00e+00 0.0 4.7e+05 1.4e+02 7.2e+01 0 0 1 0 7 0 0 1 0 7 0
MatGetRowIJ 1 1.0 5.6028e-0558.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.2708e-04 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 35 1.0 4.3098e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.5e+01 0 0 0 0 4 0 0 0 0 4 0
MatPtAP 4 1.0 6.8662e-02 1.0 1.03e+05 1.0 9.3e+05 2.9e+02 6.8e+01 0 0 1 1 7 0 0 1 1 7 12233
MatPtAPSymbolic 4 1.0 5.3361e-02 1.0 0.00e+00 0.0 5.6e+05 4.5e+02 2.8e+01 0 0 1 0 3 0 0 1 0 3 0
MatPtAPNumeric 4 1.0 1.6402e-02 1.1 1.03e+05 1.0 3.7e+05 4.4e+01 4.0e+01 0 0 1 0 4 0 0 1 0 4 51212
MatGetLocalMat 4 1.0 2.6742e-0269.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 4 1.0 1.5030e-03 2.6 0.00e+00 0.0 5.6e+05 4.5e+02 0.0e+00 0 0 1 0 0 0 0 1 0 0 0
MatGetSymTrans 8 1.0 1.9407e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 9 1.0 5.1131e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 7.3904e+01 1.0 5.72e+07 1.0 7.0e+07 7.5e+02 9.1e+02 98100100100 91 98100100100 91 6325
PCSetUp 4 1.0 1.4206e+01 1.0 1.73e+05 1.0 1.3e+06 2.2e+02 2.0e+02 19 0 2 1 20 19 0 2 1 20 100
PCApply 87 1.0 5.9362e+01 1.0 4.79e+07 1.0 6.5e+07 6.8e+02 3.5e+02 78 84 92 83 35 78 84 92 83 35 6596
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 592 592 2160472 0
Vector Scatter 14 13 18512 0
Matrix 38 38 976248 0
Matrix Null Space 1 1 584 0
Distributed Mesh 5 4 19808 0
Star Forest Bipartite Graph 10 8 6720 0
Discrete System 5 4 3360 0
Index Set 32 32 51488 0
IS L to G Mapping 5 4 6020 0
Krylov Solver 7 7 8608 0
DMKSP interface 4 4 2560 0
Preconditioner 7 7 6968 0
Viewer 3 1 752 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 4.22001e-05
Average time for zero size MPI_Send(): 1.56337e-06
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_gamg.txt
-mg_coarse_ksp_type preonly
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx named p
þÿÿ with 8192 processors, by mrosso Fri Jul 24 14:33:06 2015
Using Petsc Development GIT revision: v3.6-233-g4936542 GIT Date: 2015-07-17 10:15:47 -0500
Max Max/Min Avg Total
Time (sec): 3.447e+00 1.00038 3.446e+00
Objects: 1.368e+03 1.28935 1.066e+03
Flops: 7.647e+07 1.02006 7.608e+07 6.232e+11
Flops/sec: 2.219e+07 1.02020 2.207e+07 1.808e+11
MPI Messages: 2.096e+04 3.38688 1.201e+04 9.840e+07
MPI Message Lengths: 9.104e+06 2.00024 7.189e+02 7.074e+10
MPI Reductions: 1.416e+03 1.08506
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.1206e+00 90.5% 6.2314e+11 100.0% 9.376e+07 95.3% 7.181e+02 99.9% 1.261e+03 89.0%
1: PCRprt_SetUpMat: 2.5313e-02 0.7% 6.5418e+05 0.0% 6.123e+05 0.6% 5.931e-02 0.0% 4.425e+01 3.1%
2: PCRprt_Apply: 3.0039e-01 8.7% 8.8424e+07 0.0% 4.029e+06 4.1% 6.738e-01 0.1% 9.062e-01 0.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 232 1.0 4.3392e-02 2.6 1.90e+06 1.0 0.0e+00 0.0e+00 2.3e+02 1 2 0 0 16 1 2 0 0 18 358757
VecNorm 123 1.0 1.6137e-02 2.0 1.01e+06 1.0 0.0e+00 0.0e+00 1.2e+02 0 1 0 0 9 0 1 0 0 10 511516
VecScale 1048 1.0 1.1351e-03 1.5 1.92e+05 1.8 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1318105
VecCopy 121 1.0 1.2727e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1647 1.0 1.6043e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 696 1.0 7.1111e-03 1.4 5.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 6568316
VecAYPX 927 1.0 4.7853e-03 1.4 2.90e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 4961251
VecAssemblyBegin 4 1.0 1.2280e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 4 1.0 1.6284e-0434.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2907 1.0 2.7515e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02 0.0e+00 1 0 94 99 0 1 0 98 99 0 0
VecScatterEnd 2907 1.0 1.5621e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 4 0 0 0 0 0
MatMult 931 1.0 2.1213e-01 2.2 3.19e+07 1.0 4.3e+07 1.4e+03 0.0e+00 5 42 44 84 0 5 42 46 84 0 1228981
MatMultAdd 464 1.0 4.5297e-03 1.1 1.09e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1963600
MatMultTranspose 468 1.0 7.2241e-03 1.2 1.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1241849
MatSOR 1160 1.0 1.4814e-01 1.2 3.03e+07 1.0 4.9e+07 2.2e+02 0.0e+00 4 40 49 15 0 4 40 52 15 0 1673981
MatResidual 464 1.0 5.4564e-02 1.8 7.60e+06 1.0 2.2e+07 6.8e+02 0.0e+00 1 10 22 21 0 1 10 23 21 0 1137383
MatAssemblyBegin 26 1.0 2.9964e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01 1 0 0 0 3 1 0 0 0 3 0
MatAssemblyEnd 26 1.0 3.6304e-02 1.0 0.00e+00 0.0 4.8e+05 1.3e+02 8.0e+01 1 0 0 0 6 1 0 1 0 6 0
MatView 50 1.7 5.7154e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01 2 0 0 0 2 2 0 0 0 2 0
MatPtAP 8 1.0 4.8214e-02 1.0 2.06e+05 1.0 1.1e+06 3.5e+02 7.6e+01 1 0 1 1 5 2 0 1 1 6 34843
MatPtAPSymbolic 4 1.0 2.7914e-02 1.1 0.00e+00 0.0 5.6e+05 4.5e+02 2.8e+01 1 0 1 0 2 1 0 1 0 2 0
MatPtAPNumeric 8 1.0 2.1734e-02 1.1 2.06e+05 1.0 5.6e+05 2.6e+02 4.8e+01 1 0 1 0 3 1 0 1 0 4 77294
MatGetLocalMat 8 1.0 6.5875e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 8 1.0 1.9593e-03 2.6 0.00e+00 0.0 7.5e+05 5.1e+02 0.0e+00 0 0 1 1 0 0 0 1 1 0 0
MatGetSymTrans 8 1.0 1.4830e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 14 1.0 6.4659e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 4 1.0 9.5956e-01 1.0 7.65e+07 1.0 9.8e+07 7.2e+02 1.2e+03 28100100100 86 31100105100 97 649356
PCSetUp 4 1.0 1.7332e-01 1.0 2.76e+05 1.0 2.2e+06 1.9e+02 2.8e+02 5 0 2 1 20 5 0 2 1 22 13014
PCApply 116 1.0 7.0218e-01 1.0 6.42e+07 1.0 9.1e+07 6.5e+02 4.6e+02 20 84 92 83 33 22 84 97 83 37 743519
--- Event Stage 1: PCRprt_SetUpMat
VecSet 3 1.5 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 10 1.2 4.3280e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.1e+00 0 0 0 0 0 8 0 0 0 9 0
MatAssemblyEnd 10 1.2 8.4145e-03 1.1 0.00e+00 0.0 1.9e+05 4.2e+00 1.6e+01 0 0 0 0 1 30 0 31 13 36 0
MatGetRow 192 0.0 4.4584e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 1.0 1.0426e-02 2.3 0.00e+00 0.0 8.1e+04 2.3e+01 6.0e+00 0 0 0 0 0 23 0 13 32 14 0
MatZeroEntries 1 0.0 6.9141e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 2 1.0 1.8841e-02 1.0 8.40e+01 2.6 5.3e+05 7.4e+00 3.4e+01 1 0 1 0 2 74100 87 67 77 35
MatPtAPSymbolic 2 1.0 9.2332e-03 1.1 0.00e+00 0.0 3.3e+05 7.0e+00 1.4e+01 0 0 0 0 1 35 0 54 40 32 0
MatPtAPNumeric 2 1.0 1.0050e-02 1.1 8.40e+01 2.6 2.0e+05 7.9e+00 2.0e+01 0 0 0 0 1 39100 33 28 45 65
MatGetLocalMat 2 1.0 5.9128e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 2 1.0 5.0616e-04 3.8 0.00e+00 0.0 2.8e+05 5.3e+00 0.0e+00 0 0 0 0 0 1 0 46 26 0 0
MatGetSymTrans 4 1.0 1.0729e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: PCRprt_Apply
VecScale 348 0.0 2.4199e-04 0.0 3.34e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 4 0 0 0 13989
VecCopy 116 0.0 6.5565e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1049 3.0 3.4976e-04 6.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 116 0.0 8.7500e-05 0.0 7.42e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 10860
VecScatterBegin 1161 2.5 1.2123e-0240.8 0.00e+00 0.0 4.0e+06 1.6e+01 0.0e+00 0 0 4 0 0 0 0100100 0 0
VecScatterEnd 1161 2.5 3.0874e-0110.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 98 0 0 0 0 0
MatMult 232 2.0 9.2895e-0368.7 9.67e+04834.0 1.0e+06 1.6e+01 0.0e+00 0 0 1 0 0 1 15 25 25 0 1469
MatMultAdd 116 0.0 3.1829e-04 0.0 1.48e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 5971
MatMultTranspose 233 2.0 1.1170e-0233.1 1.52e+0465.6 9.4e+05 8.0e+00 0.0e+00 0 0 1 0 0 1 4 23 11 0 342
MatSolve 116 0.0 1.6799e-01 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatSOR 232 0.0 1.7143e-02 0.0 5.50e+05 0.0 2.1e+05 1.3e+02 0.0e+00 0 0 0 0 0 0 77 5 41 0 3947
MatLUFactorSym 1 0.0 4.6492e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 2 0.0 6.0585e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 116 0.0 4.7536e-03 0.0 1.04e+05 0.0 7.1e+04 1.3e+02 0.0e+00 0 0 0 0 0 0 14 2 14 0 2674
MatAssemblyBegin 5 0.0 4.3392e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 9.4e-02 0 0 0 0 0 0 0 0 0 10 0
MatAssemblyEnd 5 0.0 8.8215e-04 0.0 0.00e+00 0.0 1.2e+03 1.0e+01 2.5e-01 0 0 0 0 0 0 0 0 0 28 0
MatGetRowIJ 1 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 0.0 2.7895e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 2 0.0 1.5361e-03 0.0 2.82e+03 0.0 3.6e+03 6.7e+01 3.0e-01 0 0 0 0 0 0 0 0 0 33 221
MatPtAPSymbolic 1 0.0 6.6018e-04 0.0 0.00e+00 0.0 1.8e+03 8.5e+01 1.1e-01 0 0 0 0 0 0 0 0 0 12 0
MatPtAPNumeric 2 0.0 8.8406e-04 0.0 2.82e+03 0.0 1.8e+03 4.9e+01 1.9e-01 0 0 0 0 0 0 0 0 0 21 385
MatGetLocalMat 2 0.0 3.2187e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 2 0.0 1.9097e-04 0.0 0.00e+00 0.0 2.4e+03 9.6e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSymTrans 2 0.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 6 0.0 1.2183e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.1e-02 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 116 0.0 2.7114e-01 0.0 6.87e+05 0.0 2.9e+05 1.3e+02 9.1e-01 0 0 0 0 0 1 96 7 55100 312
PCSetUp 2 0.0 6.5762e-02 0.0 3.78e+03 0.0 4.9e+03 5.3e+01 9.1e-01 0 0 0 0 0 0 1 0 0100 7
PCApply 116 0.0 2.0491e-01 0.0 6.83e+05 0.0 2.8e+05 1.3e+02 0.0e+00 0 0 0 0 0 1 95 7 54 0 411
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 778 787 2743704 0
Vector Scatter 18 21 27616 0
Matrix 38 52 1034136 0
Matrix Null Space 1 1 584 0
Distributed Mesh 7 7 34664 0
Star Forest Bipartite Graph 14 14 11760 0
Discrete System 7 7 5880 0
Index Set 36 38 56544 0
IS L to G Mapping 7 7 8480 0
Krylov Solver 11 10 12240 0
DMKSP interface 4 5 3200 0
Preconditioner 11 10 10056 0
Viewer 8 6 4512 0
--- Event Stage 1: PCRprt_SetUpMat
Vector 6 5 7840 0
Vector Scatter 3 2 2128 0
Matrix 15 12 43656 0
Index Set 10 10 7896 0
--- Event Stage 2: PCRprt_Apply
Vector 364 356 685152 0
Vector Scatter 3 0 0 0
Matrix 11 0 0 0
Distributed Mesh 1 0 0 0
Star Forest Bipartite Graph 2 0 0 0
Discrete System 1 0 0 0
Index Set 10 8 6304 0
IS L to G Mapping 1 0 0 0
Krylov Solver 0 1 1136 0
DMKSP interface 1 0 0 0
Preconditioner 0 1 984 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 5.24044e-05
Average time for zero size MPI_Send(): 2.16223e-05
#PETSc Option Table entries:
-finput input.txt
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_mg7.txt
-mg_coarse_dmdarepart_ksp_constant_null_space
-mg_coarse_dmdarepart_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_ksp_type preonly
-mg_coarse_dmdarepart_mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_dmdarepart_mg_coarse_pc_type lu
-mg_coarse_dmdarepart_mg_levels_ksp_type richardson
-mg_coarse_dmdarepart_pc_mg_galerkin
-mg_coarse_dmdarepart_pc_mg_levels 2
-mg_coarse_dmdarepart_pc_type mg
-mg_coarse_ksp_type preonly
-mg_coarse_pc_dmdarepart_factor 64
-mg_coarse_pc_type dmdarepart
-mg_levels_ksp_type richardson
-options_left
-pc_dmdarepart_log
-pc_dmdarepart_monitor
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------
Using C compiler: cc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl
-----------------------------------------
More information about the petsc-users
mailing list