[petsc-users] superlinear scale-up with hypre

Christian Klettner christian.klettner at ucl.ac.uk
Wed Mar 10 13:03:16 CST 2010


Dear Barry,

Below is the performance on 32 and 64 cores respectively. I run my case
for 19 time steps and for each time step there are 4 parabolic equations
to be solved (Step 1 (u,v) and Step 3 (u,v)) and 1 elliptic equation (Step
2). This is why there are 95 KSPSolves.
The biggest difference I can see is in KSPSolve but I'm guessing this is
made up of other functions?
Also, as you can see I set "-poeq_ksp_rtol 0.000000001" for the Poisson
solve however when I print it out it says

 Residual norms for poeq_ solve.
  0 KSP Residual norm 7.862045205096e-02
  1 KSP Residual norm 1.833734529269e-02
  2 KSP Residual norm 9.243822053526e-04
  3 KSP Residual norm 1.534786635844e-04
  4 KSP Residual norm 2.032435231176e-05
  5 KSP Residual norm 3.201182258546e-06

so the tolerance has not been reached. Should I set the tolerance with a
different command?

Thanks for any advice,
Christian


************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./ex115 on a linux-gnu named node-c47 with 32 processors, by ucemckl Wed
Mar 10 02:12:45 2010
Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST 2009

                         Max       Max/Min        Avg      Total
Time (sec):           5.424e+02      1.00012   5.423e+02
Objects:              2.860e+02      1.00000   2.860e+02
Flops:                1.675e+10      1.02726   1.635e+10  5.232e+11
Flops/sec:            3.088e+07      1.02726   3.015e+07  9.647e+08
MPI Messages:         3.603e+03      2.00278   3.447e+03  1.103e+05
MPI Message Lengths:  8.272e+06      1.90365   2.285e+03  2.520e+08
MPI Reductions:       4.236e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts  
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 5.4232e+02 100.0%  5.2317e+11 100.0%  1.103e+05
100.0%  2.285e+03      100.0%  4.056e+03  95.8%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                      
      --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMin                19 1.0 9.5495e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot              1362 1.0 1.0272e+01 1.4 1.38e+09 1.0 0.0e+00 0.0e+00
1.4e+03  2  8  0  0 32   2  8  0  0 34  4212
VecMDot              101 1.0 1.3028e+00 1.0 3.44e+08 1.0 0.0e+00 0.0e+00
1.0e+02  0  2  0  0  2   0  2  0  0  2  8241
VecNorm              972 1.0 1.0458e+01 1.6 9.88e+08 1.0 0.0e+00 0.0e+00
9.7e+02  1  6  0  0 23   1  6  0  0 24  2952
VecScale             139 1.0 4.4759e-01 1.1 7.07e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  4932
VecCopy              133 1.0 6.7746e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1136 1.0 4.2686e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             1666 1.0 1.0439e+01 1.0 1.69e+09 1.0 0.0e+00 0.0e+00
0.0e+00  2 10  0  0  0   2 10  0  0  0  5069
VecAYPX              681 1.0 4.1510e+00 1.1 6.92e+08 1.0 0.0e+00 0.0e+00
0.0e+00  1  4  0  0  0   1  4  0  0  0  5211
VecAXPBYCZ            38 1.0 3.5104e-01 1.1 7.73e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  6877
VecMAXPY             120 1.0 1.7512e+00 1.0 4.46e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  3  0  0  0   0  3  0  0  0  7963
VecAssemblyBegin     290 1.0 1.4337e+0164.9 0.00e+00 0.0 3.6e+03 1.0e+03
8.7e+02  2  0  3  1 21   2  0  3  1 21     0
VecAssemblyEnd       290 1.0 8.1372e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult     280 1.0 2.5121e+00 1.1 1.42e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0  1770
VecScatterBegin     1373 1.0 5.1618e-02 1.7 0.00e+00 0.0 7.7e+04 1.3e+03
0.0e+00  0  0 70 40  0   0  0 70 40  0     0
VecScatterEnd       1373 1.0 6.2953e-0118.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         120 1.0 1.1371e+00 1.0 1.83e+08 1.0 0.0e+00 0.0e+00
1.2e+02  0  1  0  0  3   0  1  0  0  3  5028
MatMult             1048 1.0 5.6495e+01 1.1 6.86e+09 1.0 6.5e+04 1.3e+03
0.0e+00 10 41 59 34  0  10 41 59 34  0  3793
MatMultTranspose      57 1.0 3.4194e+00 1.1 4.02e+08 1.0 3.5e+03 1.3e+03
0.0e+00  1  2  3  2  0   1  2  3  2  0  3673
MatSolve             553 1.0 4.6169e+01 1.1 3.62e+09 1.0 0.0e+00 0.0e+00
0.0e+00  8 22  0  0  0   8 22  0  0  0  2448
MatLUFactorNum         2 1.0 7.9745e-01 1.2 2.78e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1088
MatILUFactorSym        2 1.0 2.7597e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCopy              133 1.0 4.7596e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatConvert            27 1.0 1.7435e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin     263 1.0 1.3145e+0132.9 0.00e+00 0.0 2.4e+04 3.7e+03
5.3e+02  2  0 22 36 12   2  0 22 36 13     0
MatAssemblyEnd       263 1.0 9.1696e+00 1.0 0.00e+00 0.0 2.5e+02 3.3e+02
6.6e+01  2  0  0  0  2   2  0  0  0  2     0
MatGetRow         901474 1.5 2.9092e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 7.2280e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries       160 1.0 3.0731e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPGMRESOrthog       101 1.0 2.6510e+00 1.0 6.87e+08 1.0 0.0e+00 0.0e+00
1.0e+02  0  4  0  0  2   0  4  0  0  2  8100
KSPSetup              78 1.0 1.4449e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              95 1.0 3.0155e+02 1.0 1.49e+10 1.0 5.4e+04 1.3e+03
2.4e+03 56 89 49 28 58  56 89 49 28 60  1540
PCSetUp                6 1.0 6.2894e+00 1.0 2.78e+07 1.0 0.0e+00 0.0e+00
6.0e+00  1  0  0  0  0   1  0  0  0  0   138
PCSetUpOnBlocks       57 1.0 1.0523e+00 1.2 2.78e+07 1.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0   824
PCApply              972 1.0 2.1798e+02 1.0 3.76e+09 1.0 0.0e+00 0.0e+00
0.0e+00 40 22  0  0  0  40 22  0  0  0   539
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

   Application Order     4              4  142960400     0
           Index Set    42             42   11937496     0
   IS L to G Mapping    18             18   39700456     0
                 Vec   131            131  335147648     0
         Vec Scatter    31             31      26412     0
              Matrix    47             47  1003139256     0
       Krylov Solver     6              6      22376     0
      Preconditioner     6              6       4256     0
              Viewer     1              1        544     0
========================================================================================================================
Average time to get PetscTime(): 2.86102e-07
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 1.71363e-06
#PETSc Option Table entries:
-log_summary
-moeq_ksp_rtol 0.000000001
-moeq_ksp_type cg
-moeq_pc_type jacobi
-poeq_ksp_monitor
-poeq_ksp_rtol 0.000000001
-poeq_ksp_type gmres
-poeq_pc_hypre_type boomeramg
-poeq_pc_type hypre
-ueq_ksp_rtol 0.000000001
-ueq_ksp_type cg
-veq_ksp_rtol 0.000000001
-veq_ksp_type cg
#End o PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Jan 29 15:15:03 2010
Configure options: --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC
--with-blas-lapack-dir=/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t/
--download-triangle --download-hypre --with-debugging=0 COPTFLAGS=" -03
-ffast-math -finline-functions" CXXOPTFLAGS=" -03 -ffast-math
-finline-functions" --with-shared=0
-----------------------------------------
Libraries compiled on Fri Jan 29 15:17:56 GMT 2010 on login01
Machine characteristics: Linux login01 2.6.9-89.el4_lustre.1.6.7.2ddn1 #11
SMP Wed Sep 9 18:48:21 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /shared/home/ucemckl/petsc-3.0.0-p10
Using PETSc arch: linux-gnu-c-opt
-----------------------------------------
Using C compiler: mpicc
Using Fortran compiler: mpif90 -O
-----------------------------------------
Using include paths:
-I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
-I/shared/home/ucemckl/petsc-3.0.0-p10/include
-I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
-I/usr/X11R6/include
------------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90 -O
Using libraries:
-Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
-L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -lpetscts
-lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc       
-Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
-L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -ltriangle
-L/usr/X11R6/lib64 -lX11 -lHYPRE -lstdc++
-Wl,-rpath,/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t
-L/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t -lmkl_lapack -lmkl
-lguide -lpthread -lnsl -laio -lrt -lPEPCF90
-L/cvos/shared/apps/infinipath/2.1/mpi/lib64 -ldl -lmpich
-L/cvos/shared/apps/intel/cce/10.1.008/lib
-L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -limf -lsvml -lipgo -lirc -lgcc_s
-lirc_s -lmpichf90nc -lmpichabiglue_intel9
-L/cvos/shared/apps/intel/fce/10.1.008/lib -lifport -lifcore -lm -lm
-lstdc++ -lstdc++ -ldl -lmpich -limf -lsvml -lipgo -lirc -lgcc_s -lirc_s
-ldl
------------------------------------------


////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./ex115 on a linux-gnu named node-f56 with 64 processors, by ucemckl Wed
Mar 10 04:33:32 2010
Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST 2009

                         Max       Max/Min        Avg      Total
Time (sec):           2.394e+02      1.00022   2.394e+02
Objects:              2.860e+02      1.00000   2.860e+02
Flops:                8.606e+09      1.04191   8.283e+09  5.301e+11
Flops/sec:            3.595e+07      1.04196   3.461e+07  2.215e+09
MPI Messages:         3.627e+03      1.98414   3.565e+03  2.282e+05
MPI Message Lengths:  7.563e+06      1.99911   2.009e+03  4.584e+08
MPI Reductions:       4.269e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts  
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.3936e+02 100.0%  5.3013e+11 100.0%  2.282e+05
100.0%  2.009e+03      100.0%  4.089e+03  95.8%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                      
      --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMin                19 1.0 4.7353e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot              1380 1.0 5.3245e+00 1.7 7.11e+08 1.0 0.0e+00 0.0e+00
1.4e+03  2  8  0  0 32   2  8  0  0 34  8224
VecMDot              104 1.0 6.9024e-01 1.0 1.84e+08 1.0 0.0e+00 0.0e+00
1.0e+02  0  2  0  0  2   0  2  0  0  3 16458
VecNorm              984 1.0 5.8349e+00 1.7 5.07e+08 1.0 0.0e+00 0.0e+00
9.8e+02  2  6  0  0 23   2  6  0  0 24  5351
VecScale             142 1.0 1.5187e-01 1.7 3.66e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 14835
VecCopy              133 1.0 3.9400e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1148 1.0 2.0722e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             1684 1.0 5.1021e+00 1.1 8.67e+08 1.0 0.0e+00 0.0e+00
0.0e+00  2 10  0  0  0   2 10  0  0  0 10473
VecAYPX              690 1.0 1.9134e+00 1.1 3.55e+08 1.0 0.0e+00 0.0e+00
0.0e+00  1  4  0  0  0   1  4  0  0  0 11443
VecAXPBYCZ            38 1.0 1.7525e-01 1.1 3.91e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 13761
VecMAXPY             123 1.0 8.9613e-01 1.1 2.38e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  3  0  0  0   0  3  0  0  0 16359
VecAssemblyBegin     290 1.0 6.6559e+0015.4 0.00e+00 0.0 7.3e+03 1.0e+03
8.7e+02  2  0  3  2 20   2  0  3  2 21     0
VecAssemblyEnd       290 1.0 1.5714e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult     280 1.0 1.2558e+00 1.1 7.21e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0  3538
VecScatterBegin     1385 1.0 4.7455e-02 1.8 0.00e+00 0.0 1.6e+05 1.3e+03
0.0e+00  0  0 69 45  0   0  0 69 45  0     0
VecScatterEnd       1385 1.0 4.8537e-0115.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         123 1.0 6.2763e-01 1.1 9.50e+07 1.0 0.0e+00 0.0e+00
1.2e+02  0  1  0  0  3   0  1  0  0  3  9328
MatMult             1060 1.0 2.4949e+01 1.1 3.51e+09 1.0 1.3e+05 1.3e+03
0.0e+00 10 41 59 38  0  10 41 59 38  0  8678
MatMultTranspose      57 1.0 1.4921e+00 1.2 2.04e+08 1.0 7.2e+03 1.3e+03
0.0e+00  1  2  3  2  0   1  2  3  2  0  8409
MatSolve             562 1.0 2.1214e+01 1.1 1.86e+09 1.0 0.0e+00 0.0e+00
0.0e+00  8 22  0  0  0   8 22  0  0  0  5409
MatLUFactorNum         2 1.0 3.7373e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2320
MatILUFactorSym        2 1.0 1.2428e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCopy              133 1.0 2.3860e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatConvert            27 1.0 8.3217e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin     263 1.0 8.3536e+0040.7 0.00e+00 0.0 5.0e+04 3.7e+03
5.3e+02  3  0 22 40 12   3  0 22 40 13     0
MatAssemblyEnd       263 1.0 4.4723e+00 1.1 0.00e+00 0.0 5.0e+02 3.3e+02
6.6e+01  2  0  0  0  2   2  0  0  0  2     0
MatGetRow         453796 1.5 1.8176e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 3.0140e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries       160 1.0 1.5786e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPGMRESOrthog       104 1.0 1.3677e+00 1.0 3.69e+08 1.0 0.0e+00 0.0e+00
1.0e+02  1  4  0  0  2   1  4  0  0  3 16612
KSPSetup              78 1.0 4.9393e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              95 1.0 1.3637e+02 1.0 7.65e+09 1.0 1.1e+05 1.3e+03
2.5e+03 57 89 49 32 58  57 89 49 32 61  3457
PCSetUp                6 1.0 2.7957e+00 1.0 1.41e+07 1.0 0.0e+00 0.0e+00
6.0e+00  1  0  0  0  0   1  0  0  0  0   310
PCSetUpOnBlocks       57 1.0 5.0076e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0  1732
PCApply              984 1.0 9.8020e+01 1.0 1.93e+09 1.0 0.0e+00 0.0e+00
0.0e+00 41 22  0  0  0  41 22  0  0  0  1216
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

   Application Order     4              4  134876056     0
           Index Set    42             42    5979736     0
   IS L to G Mapping    18             18   19841256     0
                 Vec   131            131  167538256     0
         Vec Scatter    31             31      26412     0
              Matrix    47             47  501115544     0
       Krylov Solver     6              6      22376     0
      Preconditioner     6              6       4256     0
              Viewer     1              1        544     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 1.35899e-05
Average time for zero size MPI_Send(): 1.79559e-06
#PETSc Option Table entries:
-log_summary
-moeq_ksp_rtol 0.000000001
-moeq_ksp_type cg
-moeq_pc_type jacobi
-poeq_ksp_monitor
-poeq_ksp_rtol 0.000000001
-poeq_ksp_type gmres
-poeq_pc_hypre_type boomeramg
-poeq_pc_type hypre
-ueq_ksp_rtol 0.000000001
-ueq_ksp_type cg
-veq_ksp_rtol 0.000000001
-veq_ksp_type cg
#End o PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Jan 29 15:15:03 2010
Configure options: --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC
--with-blas-lapack-dir=/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t/
--download-triangle --download-hypre --with-debugging=0 COPTFLAGS=" -03
-ffast-math -finline-functions" CXXOPTFLAGS=" -03 -ffast-math
-finline-functions" --with-shared=0
-----------------------------------------







>
>     Cannot really say without more information about what is taking
> time on 32 cores and 256 cores.
>
>     If you run 32 core and 256 core with -log_summary (also --with-
> debugging=0 ./configure version of PETSc) we'll be able to see where
> the time is being spent and so if it makes sense.
>
>     Barry
>
> On Mar 8, 2010, at 1:09 PM, Christian Klettner wrote:
>
>> Dear PETSc,
>> I am using a fractional step method to solve the Navier-Stokes
>> equation
>> which is composed of three steps. I have to solve a Poisson equation
>> for
>> pressure in Step 2 and I use the GMRES solver with Hypre's BoomerAMG
>> for
>> preconditioning. I have tested for strong scaling using a fixed
>> problem
>> size of 16million degrees of freedom and varied the number of cores
>> from
>> 32 to 256. I have found superlinear speedup up until this number of
>> cores.
>> Is there a reason why BoomerAMG exhibits this kind of behaviour?
>>
>> Best regards,
>> Christian
>>
>
>




More information about the petsc-users mailing list