[petsc-users] superlinear scale-up with hypre
Barry Smith
bsmith at mcs.anl.gov
Wed Mar 10 13:29:46 CST 2010
Christian,
The multiply, the triangular solves and the preconditioner
application are all getting super-linear speedup. My guess is that
this is due to cache-effects. Since the working set on each process is
smaller more of it stays in the cache more of the time so the run time
depends less on the time for memory access hence superlinear speedup.
If you use a nonzero initial guess the stopping criteria for the
Krylov solvers, by default is a reduction in the 2-norm of the
residual RELATIVE to the RIGHT HAND SIDE, not the initial residual.
Hence it converges "sooner than you expect". You can use the option -
ksp_converged_use_initial_residual_norm to have the decrease be
relative to the initial residual instead but I think the default is
best for time-dependent problems. If you use a zero initial guess I
cannot explain why it seems to converge "early" You can run with -
ksp_converged_reason to have it print why it stops or in the debugger
put a break point in KSPDefaultConverged() to see what is going on
with the test.
Barry
On Mar 10, 2010, at 1:03 PM, Christian Klettner wrote:
Dear Barry,
Below is the performance on 32 and 64 cores respectively. I run my case
for 19 time steps and for each time step there are 4 parabolic equations
to be solved (Step 1 (u,v) and Step 3 (u,v)) and 1 elliptic equation
(Step
2). This is why there are 95 KSPSolves.
The biggest difference I can see is in KSPSolve but I'm guessing this is
made up of other functions?
Also, as you can see I set "-poeq_ksp_rtol 0.000000001" for the Poisson
solve however when I print it out it says
Residual norms for poeq_ solve.
0 KSP Residual norm 7.862045205096e-02
1 KSP Residual norm 1.833734529269e-02
2 KSP Residual norm 9.243822053526e-04
3 KSP Residual norm 1.534786635844e-04
4 KSP Residual norm 2.032435231176e-05
5 KSP Residual norm 3.201182258546e-06
so the tolerance has not been reached. Should I set the tolerance with a
different command?
Thanks for any advice,
Christian
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary:
> ----------------------------------------------
>
> ./ex115 on a linux-gnu named node-c47 with 32 processors, by ucemckl
> Wed
> Mar 10 02:12:45 2010
> Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST
> 2009
>
> Max Max/Min Avg Total
> Time (sec): 5.424e+02 1.00012 5.423e+02
> Objects: 2.860e+02 1.00000 2.860e+02
> Flops: 1.675e+10 1.02726 1.635e+10 5.232e+11
> Flops/sec: 3.088e+07 1.02726 3.015e+07 9.647e+08
> MPI Messages: 3.603e+03 2.00278 3.447e+03 1.103e+05
> MPI Message Lengths: 8.272e+06 1.90365 2.285e+03 2.520e+08
> MPI Reductions: 4.236e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of
> length N
> --> 2N flops
> and VecAXPY() for complex vectors of
> length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 5.4232e+02 100.0% 5.2317e+11 100.0% 1.103e+05
> 100.0% 2.285e+03 100.0% 4.056e+03 95.8%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with
> PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this
> phase
> %M - percent messages in this phase %L - percent message
> lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMin 19 1.0 9.5495e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecDot 1362 1.0 1.0272e+01 1.4 1.38e+09 1.0 0.0e+00 0.0e
> +00
> 1.4e+03 2 8 0 0 32 2 8 0 0 34 4212
> VecMDot 101 1.0 1.3028e+00 1.0 3.44e+08 1.0 0.0e+00 0.0e
> +00
> 1.0e+02 0 2 0 0 2 0 2 0 0 2 8241
> VecNorm 972 1.0 1.0458e+01 1.6 9.88e+08 1.0 0.0e+00 0.0e
> +00
> 9.7e+02 1 6 0 0 23 1 6 0 0 24 2952
> VecScale 139 1.0 4.4759e-01 1.1 7.07e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4932
> VecCopy 133 1.0 6.7746e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1136 1.0 4.2686e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 1666 1.0 1.0439e+01 1.0 1.69e+09 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 2 10 0 0 0 2 10 0 0 0 5069
> VecAYPX 681 1.0 4.1510e+00 1.1 6.92e+08 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 4 0 0 0 1 4 0 0 0 5211
> VecAXPBYCZ 38 1.0 3.5104e-01 1.1 7.73e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 6877
> VecMAXPY 120 1.0 1.7512e+00 1.0 4.46e+08 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 3 0 0 0 0 3 0 0 0 7963
> VecAssemblyBegin 290 1.0 1.4337e+0164.9 0.00e+00 0.0 3.6e+03 1.0e
> +03
> 8.7e+02 2 0 3 1 21 2 0 3 1 21 0
> VecAssemblyEnd 290 1.0 8.1372e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 280 1.0 2.5121e+00 1.1 1.42e+08 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1770
> VecScatterBegin 1373 1.0 5.1618e-02 1.7 0.00e+00 0.0 7.7e+04 1.3e
> +03
> 0.0e+00 0 0 70 40 0 0 0 70 40 0 0
> VecScatterEnd 1373 1.0 6.2953e-0118.2 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecNormalize 120 1.0 1.1371e+00 1.0 1.83e+08 1.0 0.0e+00 0.0e
> +00
> 1.2e+02 0 1 0 0 3 0 1 0 0 3 5028
> MatMult 1048 1.0 5.6495e+01 1.1 6.86e+09 1.0 6.5e+04 1.3e
> +03
> 0.0e+00 10 41 59 34 0 10 41 59 34 0 3793
> MatMultTranspose 57 1.0 3.4194e+00 1.1 4.02e+08 1.0 3.5e+03 1.3e
> +03
> 0.0e+00 1 2 3 2 0 1 2 3 2 0 3673
> MatSolve 553 1.0 4.6169e+01 1.1 3.62e+09 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 8 22 0 0 0 8 22 0 0 0 2448
> MatLUFactorNum 2 1.0 7.9745e-01 1.2 2.78e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1088
> MatILUFactorSym 2 1.0 2.7597e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatCopy 133 1.0 4.7596e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> MatConvert 27 1.0 1.7435e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 263 1.0 1.3145e+0132.9 0.00e+00 0.0 2.4e+04 3.7e
> +03
> 5.3e+02 2 0 22 36 12 2 0 22 36 13 0
> MatAssemblyEnd 263 1.0 9.1696e+00 1.0 0.00e+00 0.0 2.5e+02 3.3e
> +02
> 6.6e+01 2 0 0 0 2 2 0 0 0 2 0
> MatGetRow 901474 1.5 2.9092e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 2 1.0 7.2280e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 160 1.0 3.0731e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> KSPGMRESOrthog 101 1.0 2.6510e+00 1.0 6.87e+08 1.0 0.0e+00 0.0e
> +00
> 1.0e+02 0 4 0 0 2 0 4 0 0 2 8100
> KSPSetup 78 1.0 1.4449e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 95 1.0 3.0155e+02 1.0 1.49e+10 1.0 5.4e+04 1.3e
> +03
> 2.4e+03 56 89 49 28 58 56 89 49 28 60 1540
> PCSetUp 6 1.0 6.2894e+00 1.0 2.78e+07 1.0 0.0e+00 0.0e
> +00
> 6.0e+00 1 0 0 0 0 1 0 0 0 0 138
> PCSetUpOnBlocks 57 1.0 1.0523e+00 1.2 2.78e+07 1.0 0.0e+00 0.0e
> +00
> 6.0e+00 0 0 0 0 0 0 0 0 0 0 824
> PCApply 972 1.0 2.1798e+02 1.0 3.76e+09 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 40 22 0 0 0 40 22 0 0 0 539
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> --- Event Stage 0: Main Stage
>
> Application Order 4 4 142960400 0
> Index Set 42 42 11937496 0
> IS L to G Mapping 18 18 39700456 0
> Vec 131 131 335147648 0
> Vec Scatter 31 31 26412 0
> Matrix 47 47 1003139256 0
> Krylov Solver 6 6 22376 0
> Preconditioner 6 6 4256 0
> Viewer 1 1 544 0
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> Average time to get PetscTime(): 2.86102e-07
> Average time for MPI_Barrier(): 1.27792e-05
> Average time for zero size MPI_Send(): 1.71363e-06
> #PETSc Option Table entries:
> -log_summary
> -moeq_ksp_rtol 0.000000001
> -moeq_ksp_type cg
> -moeq_pc_type jacobi
> -poeq_ksp_monitor
> -poeq_ksp_rtol 0.000000001
> -poeq_ksp_type gmres
> -poeq_pc_hypre_type boomeramg
> -poeq_pc_type hypre
> -ueq_ksp_rtol 0.000000001
> -ueq_ksp_type cg
> -veq_ksp_rtol 0.000000001
> -veq_ksp_type cg
> #End o PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Fri Jan 29 15:15:03 2010
> Configure options: --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC
> --with-blas-lapack-dir=/cvos/shared/apps/intel/mkl/10.0.2.018/lib/
> em64t/
> --download-triangle --download-hypre --with-debugging=0 COPTFLAGS="
> -03
> -ffast-math -finline-functions" CXXOPTFLAGS=" -03 -ffast-math
> -finline-functions" --with-shared=0
> -----------------------------------------
> Libraries compiled on Fri Jan 29 15:17:56 GMT 2010 on login01
> Machine characteristics: Linux login01 2.6.9-89.el4_lustre.
> 1.6.7.2ddn1 #11
> SMP Wed Sep 9 18:48:21 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /shared/home/ucemckl/petsc-3.0.0-p10
> Using PETSc arch: linux-gnu-c-opt
> -----------------------------------------
> Using C compiler: mpicc
> Using Fortran compiler: mpif90 -O
> -----------------------------------------
> Using include paths:
> -I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
> -I/shared/home/ucemckl/petsc-3.0.0-p10/include
> -I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
> -I/usr/X11R6/include
> ------------------------------------------
> Using C linker: mpicc
> Using Fortran linker: mpif90 -O
> Using libraries:
> -Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
> -L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
> -L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -ltriangle
> -L/usr/X11R6/lib64 -lX11 -lHYPRE -lstdc++
> -Wl,-rpath,/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t
> -L/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t -lmkl_lapack -lmkl
> -lguide -lpthread -lnsl -laio -lrt -lPEPCF90
> -L/cvos/shared/apps/infinipath/2.1/mpi/lib64 -ldl -lmpich
> -L/cvos/shared/apps/intel/cce/10.1.008/lib
> -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -limf -lsvml -lipgo -lirc -
> lgcc_s
> -lirc_s -lmpichf90nc -lmpichabiglue_intel9
> -L/cvos/shared/apps/intel/fce/10.1.008/lib -lifport -lifcore -lm -lm
> -lstdc++ -lstdc++ -ldl -lmpich -limf -lsvml -lipgo -lirc -lgcc_s -
> lirc_s
> -ldl
> ------------------------------------------
>
>
> ////////////////////////////////////////////////////////////////////////
> /////////////////////////////////////////////////////////////////////////
> ////////////////////////////////////////////////////////////////////////
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary:
> ----------------------------------------------
>
> ./ex115 on a linux-gnu named node-f56 with 64 processors, by ucemckl
> Wed
> Mar 10 04:33:32 2010
> Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST
> 2009
>
> Max Max/Min Avg Total
> Time (sec): 2.394e+02 1.00022 2.394e+02
> Objects: 2.860e+02 1.00000 2.860e+02
> Flops: 8.606e+09 1.04191 8.283e+09 5.301e+11
> Flops/sec: 3.595e+07 1.04196 3.461e+07 2.215e+09
> MPI Messages: 3.627e+03 1.98414 3.565e+03 2.282e+05
> MPI Message Lengths: 7.563e+06 1.99911 2.009e+03 4.584e+08
> MPI Reductions: 4.269e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of
> length N
> --> 2N flops
> and VecAXPY() for complex vectors of
> length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 2.3936e+02 100.0% 5.3013e+11 100.0% 2.282e+05
> 100.0% 2.009e+03 100.0% 4.089e+03 95.8%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with
> PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this
> phase
> %M - percent messages in this phase %L - percent message
> lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMin 19 1.0 4.7353e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecDot 1380 1.0 5.3245e+00 1.7 7.11e+08 1.0 0.0e+00 0.0e
> +00
> 1.4e+03 2 8 0 0 32 2 8 0 0 34 8224
> VecMDot 104 1.0 6.9024e-01 1.0 1.84e+08 1.0 0.0e+00 0.0e
> +00
> 1.0e+02 0 2 0 0 2 0 2 0 0 3 16458
> VecNorm 984 1.0 5.8349e+00 1.7 5.07e+08 1.0 0.0e+00 0.0e
> +00
> 9.8e+02 2 6 0 0 23 2 6 0 0 24 5351
> VecScale 142 1.0 1.5187e-01 1.7 3.66e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 14835
> VecCopy 133 1.0 3.9400e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1148 1.0 2.0722e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 1684 1.0 5.1021e+00 1.1 8.67e+08 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 2 10 0 0 0 2 10 0 0 0 10473
> VecAYPX 690 1.0 1.9134e+00 1.1 3.55e+08 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 4 0 0 0 1 4 0 0 0 11443
> VecAXPBYCZ 38 1.0 1.7525e-01 1.1 3.91e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 13761
> VecMAXPY 123 1.0 8.9613e-01 1.1 2.38e+08 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 3 0 0 0 0 3 0 0 0 16359
> VecAssemblyBegin 290 1.0 6.6559e+0015.4 0.00e+00 0.0 7.3e+03 1.0e
> +03
> 8.7e+02 2 0 3 2 20 2 0 3 2 21 0
> VecAssemblyEnd 290 1.0 1.5714e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 280 1.0 1.2558e+00 1.1 7.21e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 3538
> VecScatterBegin 1385 1.0 4.7455e-02 1.8 0.00e+00 0.0 1.6e+05 1.3e
> +03
> 0.0e+00 0 0 69 45 0 0 0 69 45 0 0
> VecScatterEnd 1385 1.0 4.8537e-0115.5 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecNormalize 123 1.0 6.2763e-01 1.1 9.50e+07 1.0 0.0e+00 0.0e
> +00
> 1.2e+02 0 1 0 0 3 0 1 0 0 3 9328
> MatMult 1060 1.0 2.4949e+01 1.1 3.51e+09 1.0 1.3e+05 1.3e
> +03
> 0.0e+00 10 41 59 38 0 10 41 59 38 0 8678
> MatMultTranspose 57 1.0 1.4921e+00 1.2 2.04e+08 1.0 7.2e+03 1.3e
> +03
> 0.0e+00 1 2 3 2 0 1 2 3 2 0 8409
> MatSolve 562 1.0 2.1214e+01 1.1 1.86e+09 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 8 22 0 0 0 8 22 0 0 0 5409
> MatLUFactorNum 2 1.0 3.7373e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2320
> MatILUFactorSym 2 1.0 1.2428e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatCopy 133 1.0 2.3860e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> MatConvert 27 1.0 8.3217e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 263 1.0 8.3536e+0040.7 0.00e+00 0.0 5.0e+04 3.7e
> +03
> 5.3e+02 3 0 22 40 12 3 0 22 40 13 0
> MatAssemblyEnd 263 1.0 4.4723e+00 1.1 0.00e+00 0.0 5.0e+02 3.3e
> +02
> 6.6e+01 2 0 0 0 2 2 0 0 0 2 0
> MatGetRow 453796 1.5 1.8176e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 2 1.0 3.0140e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 160 1.0 1.5786e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> KSPGMRESOrthog 104 1.0 1.3677e+00 1.0 3.69e+08 1.0 0.0e+00 0.0e
> +00
> 1.0e+02 1 4 0 0 2 1 4 0 0 3 16612
> KSPSetup 78 1.0 4.9393e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e
> +00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 95 1.0 1.3637e+02 1.0 7.65e+09 1.0 1.1e+05 1.3e
> +03
> 2.5e+03 57 89 49 32 58 57 89 49 32 61 3457
> PCSetUp 6 1.0 2.7957e+00 1.0 1.41e+07 1.0 0.0e+00 0.0e
> +00
> 6.0e+00 1 0 0 0 0 1 0 0 0 0 310
> PCSetUpOnBlocks 57 1.0 5.0076e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e
> +00
> 6.0e+00 0 0 0 0 0 0 0 0 0 0 1732
> PCApply 984 1.0 9.8020e+01 1.0 1.93e+09 1.0 0.0e+00 0.0e
> +00
> 0.0e+00 41 22 0 0 0 41 22 0 0 0 1216
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> --- Event Stage 0: Main Stage
>
> Application Order 4 4 134876056 0
> Index Set 42 42 5979736 0
> IS L to G Mapping 18 18 19841256 0
> Vec 131 131 167538256 0
> Vec Scatter 31 31 26412 0
> Matrix 47 47 501115544 0
> Krylov Solver 6 6 22376 0
> Preconditioner 6 6 4256 0
> Viewer 1 1 544 0
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 1.35899e-05
> Average time for zero size MPI_Send(): 1.79559e-06
> #PETSc Option Table entries:
> -log_summary
> -moeq_ksp_rtol 0.000000001
> -moeq_ksp_type cg
> -moeq_pc_type jacobi
> -poeq_ksp_monitor
> -poeq_ksp_rtol 0.000000001
> -poeq_ksp_type gmres
> -poeq_pc_hypre_type boomeramg
> -poeq_pc_type hypre
> -ueq_ksp_rtol 0.000000001
More information about the petsc-users
mailing list