[petsc-users] superlinear scale-up with hypre
Christian Klettner
christian.klettner at ucl.ac.uk
Wed Mar 10 13:03:16 CST 2010
Dear Barry,
Below is the performance on 32 and 64 cores respectively. I run my case
for 19 time steps and for each time step there are 4 parabolic equations
to be solved (Step 1 (u,v) and Step 3 (u,v)) and 1 elliptic equation (Step
2). This is why there are 95 KSPSolves.
The biggest difference I can see is in KSPSolve but I'm guessing this is
made up of other functions?
Also, as you can see I set "-poeq_ksp_rtol 0.000000001" for the Poisson
solve however when I print it out it says
Residual norms for poeq_ solve.
0 KSP Residual norm 7.862045205096e-02
1 KSP Residual norm 1.833734529269e-02
2 KSP Residual norm 9.243822053526e-04
3 KSP Residual norm 1.534786635844e-04
4 KSP Residual norm 2.032435231176e-05
5 KSP Residual norm 3.201182258546e-06
so the tolerance has not been reached. Should I set the tolerance with a
different command?
Thanks for any advice,
Christian
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./ex115 on a linux-gnu named node-c47 with 32 processors, by ucemckl Wed
Mar 10 02:12:45 2010
Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST 2009
Max Max/Min Avg Total
Time (sec): 5.424e+02 1.00012 5.423e+02
Objects: 2.860e+02 1.00000 2.860e+02
Flops: 1.675e+10 1.02726 1.635e+10 5.232e+11
Flops/sec: 3.088e+07 1.02726 3.015e+07 9.647e+08
MPI Messages: 3.603e+03 2.00278 3.447e+03 1.103e+05
MPI Message Lengths: 8.272e+06 1.90365 2.285e+03 2.520e+08
MPI Reductions: 4.236e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 5.4232e+02 100.0% 5.2317e+11 100.0% 1.103e+05
100.0% 2.285e+03 100.0% 4.056e+03 95.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMin 19 1.0 9.5495e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecDot 1362 1.0 1.0272e+01 1.4 1.38e+09 1.0 0.0e+00 0.0e+00
1.4e+03 2 8 0 0 32 2 8 0 0 34 4212
VecMDot 101 1.0 1.3028e+00 1.0 3.44e+08 1.0 0.0e+00 0.0e+00
1.0e+02 0 2 0 0 2 0 2 0 0 2 8241
VecNorm 972 1.0 1.0458e+01 1.6 9.88e+08 1.0 0.0e+00 0.0e+00
9.7e+02 1 6 0 0 23 1 6 0 0 24 2952
VecScale 139 1.0 4.4759e-01 1.1 7.07e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 4932
VecCopy 133 1.0 6.7746e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1136 1.0 4.2686e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 1666 1.0 1.0439e+01 1.0 1.69e+09 1.0 0.0e+00 0.0e+00
0.0e+00 2 10 0 0 0 2 10 0 0 0 5069
VecAYPX 681 1.0 4.1510e+00 1.1 6.92e+08 1.0 0.0e+00 0.0e+00
0.0e+00 1 4 0 0 0 1 4 0 0 0 5211
VecAXPBYCZ 38 1.0 3.5104e-01 1.1 7.73e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 6877
VecMAXPY 120 1.0 1.7512e+00 1.0 4.46e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 3 0 0 0 0 3 0 0 0 7963
VecAssemblyBegin 290 1.0 1.4337e+0164.9 0.00e+00 0.0 3.6e+03 1.0e+03
8.7e+02 2 0 3 1 21 2 0 3 1 21 0
VecAssemblyEnd 290 1.0 8.1372e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 280 1.0 2.5121e+00 1.1 1.42e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1770
VecScatterBegin 1373 1.0 5.1618e-02 1.7 0.00e+00 0.0 7.7e+04 1.3e+03
0.0e+00 0 0 70 40 0 0 0 70 40 0 0
VecScatterEnd 1373 1.0 6.2953e-0118.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 120 1.0 1.1371e+00 1.0 1.83e+08 1.0 0.0e+00 0.0e+00
1.2e+02 0 1 0 0 3 0 1 0 0 3 5028
MatMult 1048 1.0 5.6495e+01 1.1 6.86e+09 1.0 6.5e+04 1.3e+03
0.0e+00 10 41 59 34 0 10 41 59 34 0 3793
MatMultTranspose 57 1.0 3.4194e+00 1.1 4.02e+08 1.0 3.5e+03 1.3e+03
0.0e+00 1 2 3 2 0 1 2 3 2 0 3673
MatSolve 553 1.0 4.6169e+01 1.1 3.62e+09 1.0 0.0e+00 0.0e+00
0.0e+00 8 22 0 0 0 8 22 0 0 0 2448
MatLUFactorNum 2 1.0 7.9745e-01 1.2 2.78e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1088
MatILUFactorSym 2 1.0 2.7597e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCopy 133 1.0 4.7596e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatConvert 27 1.0 1.7435e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 263 1.0 1.3145e+0132.9 0.00e+00 0.0 2.4e+04 3.7e+03
5.3e+02 2 0 22 36 12 2 0 22 36 13 0
MatAssemblyEnd 263 1.0 9.1696e+00 1.0 0.00e+00 0.0 2.5e+02 3.3e+02
6.6e+01 2 0 0 0 2 2 0 0 0 2 0
MatGetRow 901474 1.5 2.9092e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 7.2280e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 160 1.0 3.0731e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
KSPGMRESOrthog 101 1.0 2.6510e+00 1.0 6.87e+08 1.0 0.0e+00 0.0e+00
1.0e+02 0 4 0 0 2 0 4 0 0 2 8100
KSPSetup 78 1.0 1.4449e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 95 1.0 3.0155e+02 1.0 1.49e+10 1.0 5.4e+04 1.3e+03
2.4e+03 56 89 49 28 58 56 89 49 28 60 1540
PCSetUp 6 1.0 6.2894e+00 1.0 2.78e+07 1.0 0.0e+00 0.0e+00
6.0e+00 1 0 0 0 0 1 0 0 0 0 138
PCSetUpOnBlocks 57 1.0 1.0523e+00 1.2 2.78e+07 1.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 824
PCApply 972 1.0 2.1798e+02 1.0 3.76e+09 1.0 0.0e+00 0.0e+00
0.0e+00 40 22 0 0 0 40 22 0 0 0 539
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Application Order 4 4 142960400 0
Index Set 42 42 11937496 0
IS L to G Mapping 18 18 39700456 0
Vec 131 131 335147648 0
Vec Scatter 31 31 26412 0
Matrix 47 47 1003139256 0
Krylov Solver 6 6 22376 0
Preconditioner 6 6 4256 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 2.86102e-07
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 1.71363e-06
#PETSc Option Table entries:
-log_summary
-moeq_ksp_rtol 0.000000001
-moeq_ksp_type cg
-moeq_pc_type jacobi
-poeq_ksp_monitor
-poeq_ksp_rtol 0.000000001
-poeq_ksp_type gmres
-poeq_pc_hypre_type boomeramg
-poeq_pc_type hypre
-ueq_ksp_rtol 0.000000001
-ueq_ksp_type cg
-veq_ksp_rtol 0.000000001
-veq_ksp_type cg
#End o PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Jan 29 15:15:03 2010
Configure options: --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC
--with-blas-lapack-dir=/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t/
--download-triangle --download-hypre --with-debugging=0 COPTFLAGS=" -03
-ffast-math -finline-functions" CXXOPTFLAGS=" -03 -ffast-math
-finline-functions" --with-shared=0
-----------------------------------------
Libraries compiled on Fri Jan 29 15:17:56 GMT 2010 on login01
Machine characteristics: Linux login01 2.6.9-89.el4_lustre.1.6.7.2ddn1 #11
SMP Wed Sep 9 18:48:21 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /shared/home/ucemckl/petsc-3.0.0-p10
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: mpicc
Using Fortran compiler: mpif90 -O
-----------------------------------------
Using include paths:
-I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
-I/shared/home/ucemckl/petsc-3.0.0-p10/include
-I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
-I/usr/X11R6/include
------------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90 -O
Using libraries:
-Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
-L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -lpetscts
-lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
-L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -ltriangle
-L/usr/X11R6/lib64 -lX11 -lHYPRE -lstdc++
-Wl,-rpath,/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t
-L/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t -lmkl_lapack -lmkl
-lguide -lpthread -lnsl -laio -lrt -lPEPCF90
-L/cvos/shared/apps/infinipath/2.1/mpi/lib64 -ldl -lmpich
-L/cvos/shared/apps/intel/cce/10.1.008/lib
-L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -limf -lsvml -lipgo -lirc -lgcc_s
-lirc_s -lmpichf90nc -lmpichabiglue_intel9
-L/cvos/shared/apps/intel/fce/10.1.008/lib -lifport -lifcore -lm -lm
-lstdc++ -lstdc++ -ldl -lmpich -limf -lsvml -lipgo -lirc -lgcc_s -lirc_s
-ldl
------------------------------------------
////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./ex115 on a linux-gnu named node-f56 with 64 processors, by ucemckl Wed
Mar 10 04:33:32 2010
Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST 2009
Max Max/Min Avg Total
Time (sec): 2.394e+02 1.00022 2.394e+02
Objects: 2.860e+02 1.00000 2.860e+02
Flops: 8.606e+09 1.04191 8.283e+09 5.301e+11
Flops/sec: 3.595e+07 1.04196 3.461e+07 2.215e+09
MPI Messages: 3.627e+03 1.98414 3.565e+03 2.282e+05
MPI Message Lengths: 7.563e+06 1.99911 2.009e+03 4.584e+08
MPI Reductions: 4.269e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 2.3936e+02 100.0% 5.3013e+11 100.0% 2.282e+05
100.0% 2.009e+03 100.0% 4.089e+03 95.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMin 19 1.0 4.7353e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecDot 1380 1.0 5.3245e+00 1.7 7.11e+08 1.0 0.0e+00 0.0e+00
1.4e+03 2 8 0 0 32 2 8 0 0 34 8224
VecMDot 104 1.0 6.9024e-01 1.0 1.84e+08 1.0 0.0e+00 0.0e+00
1.0e+02 0 2 0 0 2 0 2 0 0 3 16458
VecNorm 984 1.0 5.8349e+00 1.7 5.07e+08 1.0 0.0e+00 0.0e+00
9.8e+02 2 6 0 0 23 2 6 0 0 24 5351
VecScale 142 1.0 1.5187e-01 1.7 3.66e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 14835
VecCopy 133 1.0 3.9400e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1148 1.0 2.0722e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 1684 1.0 5.1021e+00 1.1 8.67e+08 1.0 0.0e+00 0.0e+00
0.0e+00 2 10 0 0 0 2 10 0 0 0 10473
VecAYPX 690 1.0 1.9134e+00 1.1 3.55e+08 1.0 0.0e+00 0.0e+00
0.0e+00 1 4 0 0 0 1 4 0 0 0 11443
VecAXPBYCZ 38 1.0 1.7525e-01 1.1 3.91e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 13761
VecMAXPY 123 1.0 8.9613e-01 1.1 2.38e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 3 0 0 0 0 3 0 0 0 16359
VecAssemblyBegin 290 1.0 6.6559e+0015.4 0.00e+00 0.0 7.3e+03 1.0e+03
8.7e+02 2 0 3 2 20 2 0 3 2 21 0
VecAssemblyEnd 290 1.0 1.5714e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 280 1.0 1.2558e+00 1.1 7.21e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 3538
VecScatterBegin 1385 1.0 4.7455e-02 1.8 0.00e+00 0.0 1.6e+05 1.3e+03
0.0e+00 0 0 69 45 0 0 0 69 45 0 0
VecScatterEnd 1385 1.0 4.8537e-0115.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 123 1.0 6.2763e-01 1.1 9.50e+07 1.0 0.0e+00 0.0e+00
1.2e+02 0 1 0 0 3 0 1 0 0 3 9328
MatMult 1060 1.0 2.4949e+01 1.1 3.51e+09 1.0 1.3e+05 1.3e+03
0.0e+00 10 41 59 38 0 10 41 59 38 0 8678
MatMultTranspose 57 1.0 1.4921e+00 1.2 2.04e+08 1.0 7.2e+03 1.3e+03
0.0e+00 1 2 3 2 0 1 2 3 2 0 8409
MatSolve 562 1.0 2.1214e+01 1.1 1.86e+09 1.0 0.0e+00 0.0e+00
0.0e+00 8 22 0 0 0 8 22 0 0 0 5409
MatLUFactorNum 2 1.0 3.7373e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 2320
MatILUFactorSym 2 1.0 1.2428e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCopy 133 1.0 2.3860e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatConvert 27 1.0 8.3217e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 263 1.0 8.3536e+0040.7 0.00e+00 0.0 5.0e+04 3.7e+03
5.3e+02 3 0 22 40 12 3 0 22 40 13 0
MatAssemblyEnd 263 1.0 4.4723e+00 1.1 0.00e+00 0.0 5.0e+02 3.3e+02
6.6e+01 2 0 0 0 2 2 0 0 0 2 0
MatGetRow 453796 1.5 1.8176e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 3.0140e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 160 1.0 1.5786e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
KSPGMRESOrthog 104 1.0 1.3677e+00 1.0 3.69e+08 1.0 0.0e+00 0.0e+00
1.0e+02 1 4 0 0 2 1 4 0 0 3 16612
KSPSetup 78 1.0 4.9393e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 95 1.0 1.3637e+02 1.0 7.65e+09 1.0 1.1e+05 1.3e+03
2.5e+03 57 89 49 32 58 57 89 49 32 61 3457
PCSetUp 6 1.0 2.7957e+00 1.0 1.41e+07 1.0 0.0e+00 0.0e+00
6.0e+00 1 0 0 0 0 1 0 0 0 0 310
PCSetUpOnBlocks 57 1.0 5.0076e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 1732
PCApply 984 1.0 9.8020e+01 1.0 1.93e+09 1.0 0.0e+00 0.0e+00
0.0e+00 41 22 0 0 0 41 22 0 0 0 1216
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Application Order 4 4 134876056 0
Index Set 42 42 5979736 0
IS L to G Mapping 18 18 19841256 0
Vec 131 131 167538256 0
Vec Scatter 31 31 26412 0
Matrix 47 47 501115544 0
Krylov Solver 6 6 22376 0
Preconditioner 6 6 4256 0
Viewer 1 1 544 0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 1.35899e-05
Average time for zero size MPI_Send(): 1.79559e-06
#PETSc Option Table entries:
-log_summary
-moeq_ksp_rtol 0.000000001
-moeq_ksp_type cg
-moeq_pc_type jacobi
-poeq_ksp_monitor
-poeq_ksp_rtol 0.000000001
-poeq_ksp_type gmres
-poeq_pc_hypre_type boomeramg
-poeq_pc_type hypre
-ueq_ksp_rtol 0.000000001
-ueq_ksp_type cg
-veq_ksp_rtol 0.000000001
-veq_ksp_type cg
#End o PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Jan 29 15:15:03 2010
Configure options: --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC
--with-blas-lapack-dir=/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t/
--download-triangle --download-hypre --with-debugging=0 COPTFLAGS=" -03
-ffast-math -finline-functions" CXXOPTFLAGS=" -03 -ffast-math
-finline-functions" --with-shared=0
-----------------------------------------
>
> Cannot really say without more information about what is taking
> time on 32 cores and 256 cores.
>
> If you run 32 core and 256 core with -log_summary (also --with-
> debugging=0 ./configure version of PETSc) we'll be able to see where
> the time is being spent and so if it makes sense.
>
> Barry
>
> On Mar 8, 2010, at 1:09 PM, Christian Klettner wrote:
>
>> Dear PETSc,
>> I am using a fractional step method to solve the Navier-Stokes
>> equation
>> which is composed of three steps. I have to solve a Poisson equation
>> for
>> pressure in Step 2 and I use the GMRES solver with Hypre's BoomerAMG
>> for
>> preconditioning. I have tested for strong scaling using a fixed
>> problem
>> size of 16million degrees of freedom and varied the number of cores
>> from
>> 32 to 256. I have found superlinear speedup up until this number of
>> cores.
>> Is there a reason why BoomerAMG exhibits this kind of behaviour?
>>
>> Best regards,
>> Christian
>>
>
>
More information about the petsc-users
mailing list