Problems with multiplication scaling
Christian Klettner
christian.klettner at ucl.ac.uk
Sun Jun 14 11:23:48 CDT 2009
Dear PETSc Team,
I have used Hypres BoomerAMG to cut the iteration count in solving a
Poisson type equation (i.e. Ax=b). The sparse matrix arises from a finite
element discretization of the Navier-Stokes equations. However, the
performance was very poor and so I checked the multiplication routine in
my code. Below are the results for a 1000 250,000x250,000 matrix-vector
operations. The time for the multiplications goes from 15.8 seconds to ~11
seconds when changing from 4 to 8 cores. The ratios indicate that there is
good load balancing so I was wondering if this is to do with how I
configure PETSc??? Or is it my machine->
I am using a 2x quad core 2.3GHz Opteron (Shanghai).
Best regards,
Christian Klettner
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./ex4 on a linux-gnu named christian-desktop with 4 processors, by
christian Sun Jun 14 16:48:24 2009
Using Petsc Release Version 3.0.0, Patch 4, Fri Mar 6 14:46:08 CST 2009
Max Max/Min Avg Total
Time (sec): 1.974e+01 1.00119 1.973e+01
Objects: 1.080e+02 1.00000 1.080e+02
Flops: 8.078e+08 1.00163 8.070e+08 3.228e+09
Flops/sec: 4.095e+07 1.00232 4.090e+07 1.636e+08
Memory: 1.090e+08 1.00942 4.345e+08
MPI Messages: 2.071e+03 2.00000 1.553e+03 6.213e+03
MPI Message Lengths: 2.237e+06 2.00000 1.080e+03 6.712e+06
MPI Reductions: 7.250e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.9730e+01 100.0% 3.2281e+09 100.0% 6.213e+03
100.0% 1.080e+03 100.0% 2.120e+02 73.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run config/configure.py #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 5 1.0 1.2703e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 3 1.0 2.9233e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
9.0e+00 0 0 0 0 3 0 0 0 0 4 0
VecAssemblyEnd 3 1.0 2.2650e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 1003 1.0 1.8717e-01 4.1 0.00e+00 0.0 6.0e+03 1.1e+03
0.0e+00 1 0 97 95 0 1 0 97 95 0 0
VecScatterEnd 1003 1.0 5.3403e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 20 0 0 0 0 20 0 0 0 0 0
MatMult 1000 1.0 1.5877e+01 1.0 8.08e+08 1.0 6.0e+03 1.1e+03
0.0e+00 80100 97 95 0 80100 97 95 0 203
MatAssemblyBegin 7 1.0 3.6728e-01 1.9 0.00e+00 0.0 6.3e+01 5.0e+03
1.4e+01 1 0 1 5 5 1 0 1 5 7 0
MatAssemblyEnd 7 1.0 8.6817e-01 1.2 0.00e+00 0.0 8.4e+01 2.7e+02
7.0e+01 4 0 1 0 24 4 0 1 0 33 0
MatZeroEntries 7 1.0 5.7693e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Application Order 2 0 0 0
Index Set 30 30 18476 0
IS L to G Mapping 10 0 0 0
Vec 30 7 9128 0
Vec Scatter 15 0 0 0
Matrix 21 0 0 0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 5.89848e-05
Average time for zero size MPI_Send(): 6.80089e-05
#PETSc Option Table entries:
-log_summary output1
#End o PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Jun 12 16:59:30 2009
Configure options: --with-cc="gcc -fPIC" --download-mpich=1
--download-f-blas-lapack --download-triangle --download-parmetis
--with-hypre=1 --download-hypre=1 --with-shared=0
-----------------------------------------
Libraries compiled on Fri Jun 12 17:11:54 BST 2009 on christian-desktop
Machine characteristics: Linux christian-desktop 2.6.27-7-generic #1 SMP
Fri Oct 24 06:40:41 UTC 2008 x86_64 GNU/Linux
Using PETSc directory: /home/christian/Desktop/petsc-3.0.0-p4
Using PETSc arch: linux-gnu-c-debug
-----------------------------------------
Using C compiler:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -g3
Using Fortran compiler:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
-Wno-unused-variable -g
-----------------------------------------
Using include paths:
-I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
-I/home/christian/Desktop/petsc-3.0.0-p4/include
-I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
------------------------------------------
Using C linker:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -g3
Using Fortran linker:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
-Wno-unused-variable -g
Using libraries:
-Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
-L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -lpetscts
-lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
-L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -ltriangle
-lparmetis -lmetis -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl
-lrt -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
-L/usr/lib/gcc/x86_64-linux-gnu/4.3.2 -L/lib -ldl -lmpich -lpthread -lrt
-lgcc_s -lmpichf90 -lgfortranbegin -lgfortran -lm
-L/usr/lib/gcc/x86_64-linux-gnu -lm -lmpichcxx -lstdc++ -lmpichcxx
-lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./ex4 on a linux-gnu named christian-desktop with 8 processors, by
christian Sun Jun 14 17:13:40 2009
Using Petsc Release Version 3.0.0, Patch 4, Fri Mar 6 14:46:08 CST 2009
Max Max/Min Avg Total
Time (sec): 1.452e+01 1.01190 1.443e+01
Objects: 1.080e+02 1.00000 1.080e+02
Flops: 3.739e+08 1.00373 3.731e+08 2.985e+09
Flops/sec: 2.599e+07 1.01190 2.585e+07 2.068e+08
Memory: 5.157e+07 1.01231 4.117e+08
MPI Messages: 2.071e+03 2.00000 1.812e+03 1.450e+04
MPI Message Lengths: 2.388e+06 2.00000 1.153e+03 1.672e+07
MPI Reductions: 3.625e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.4431e+01 100.0% 2.9847e+09 100.0% 1.450e+04
100.0% 1.153e+03 100.0% 2.120e+02 73.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run config/configure.py #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 5 1.0 6.1178e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 3 1.0 7.7400e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
9.0e+00 0 0 0 0 3 0 0 0 0 4 0
VecAssemblyEnd 3 1.0 4.1008e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 1003 1.0 1.0858e-01 2.9 0.00e+00 0.0 1.4e+04 1.1e+03
0.0e+00 1 0 97 95 0 1 0 97 95 0 0
VecScatterEnd 1003 1.0 5.3962e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 33 0 0 0 0 33 0 0 0 0 0
MatMult 1000 1.0 1.1430e+01 1.0 3.74e+08 1.0 1.4e+04 1.1e+03
0.0e+00 79100 97 95 0 79100 97 95 0 261
MatAssemblyBegin 7 1.0 4.6307e-01 1.8 0.00e+00 0.0 1.5e+02 5.3e+03
1.4e+01 3 0 1 5 5 3 0 1 5 7 0
MatAssemblyEnd 7 1.0 6.9013e-01 1.3 0.00e+00 0.0 2.0e+02 2.8e+02
7.0e+01 4 0 1 0 24 4 0 1 0 33 0
MatZeroEntries 7 1.0 2.7971e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Application Order 2 0 0 0
Index Set 30 30 18476 0
IS L to G Mapping 10 0 0 0
Vec 30 7 9128 0
Vec Scatter 15 0 0 0
Matrix 21 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000419807
Average time for zero size MPI_Send(): 0.000115991
#PETSc Option Table entries:
-log_summary output18
#End o PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Jun 12 16:59:30 2009
Configure options: --with-cc="gcc -fPIC" --download-mpich=1
--download-f-blas-lapack --download-triangle --download-parmetis
--with-hypre=1 --download-hypre=1 --with-shared=0
-----------------------------------------
Libraries compiled on Fri Jun 12 17:11:54 BST 2009 on christian-desktop
Machine characteristics: Linux christian-desktop 2.6.27-7-generic #1 SMP
Fri Oct 24 06:40:41 UTC 2008 x86_64 GNU/Linux
Using PETSc directory: /home/christian/Desktop/petsc-3.0.0-p4
Using PETSc arch: linux-gnu-c-debug
-----------------------------------------
Using C compiler:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -g3
Using Fortran compiler:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
-Wno-unused-variable -g
-----------------------------------------
Using include paths:
-I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
-I/home/christian/Desktop/petsc-3.0.0-p4/include
-I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
------------------------------------------
Using C linker:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -g3
Using Fortran linker:
/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
-Wno-unused-variable -g
Using libraries:
-Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
-L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -lpetscts
-lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
-L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -ltriangle
-lparmetis -lmetis -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl
-lrt -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
-L/usr/lib/gcc/x86_64-linux-gnu/4.3.2 -L/lib -ldl -lmpich -lpthread -lrt
-lgcc_s -lmpichf90 -lgfortranbegin -lgfortran -lm
-L/usr/lib/gcc/x86_64-linux-gnu -lm -lmpichcxx -lstdc++ -lmpichcxx
-lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
------------------------------------------
More information about the petsc-users
mailing list