Problems with multiplication scaling

Matthew Knepley knepley at gmail.com
Sun Jun 14 15:01:28 CDT 2009


Matvec is a bandwidth limited operation, so adding more compute power will
not usually make it go much faster. Hardware manufacturers don't tell you
this
stuff.

  Matt

On Sun, Jun 14, 2009 at 11:23 AM, Christian Klettner <
christian.klettner at ucl.ac.uk> wrote:

> Dear PETSc Team,
> I have used Hypres BoomerAMG to cut the iteration count in solving a
> Poisson type equation (i.e. Ax=b). The sparse matrix arises from a finite
> element discretization of the Navier-Stokes equations. However, the
> performance was very poor and so I checked the multiplication routine in
> my code. Below are the results for a 1000 250,000x250,000 matrix-vector
> operations. The time for the multiplications goes from 15.8 seconds to ~11
> seconds when changing from 4 to 8 cores. The ratios indicate that there is
> good load balancing so I was wondering if this is to do with how I
> configure PETSc??? Or is it my machine->
> I am using a 2x quad core 2.3GHz Opteron (Shanghai).
> Best regards,
> Christian Klettner
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./ex4 on a linux-gnu named christian-desktop with 4 processors, by
> christian Sun Jun 14 16:48:24 2009
> Using Petsc Release Version 3.0.0, Patch 4, Fri Mar  6 14:46:08 CST 2009
>
>                         Max       Max/Min        Avg      Total
> Time (sec):           1.974e+01      1.00119   1.973e+01
> Objects:              1.080e+02      1.00000   1.080e+02
> Flops:                8.078e+08      1.00163   8.070e+08  3.228e+09
> Flops/sec:            4.095e+07      1.00232   4.090e+07  1.636e+08
> Memory:               1.090e+08      1.00942              4.345e+08
> MPI Messages:         2.071e+03      2.00000   1.553e+03  6.213e+03
> MPI Message Lengths:  2.237e+06      2.00000   1.080e+03  6.712e+06
> MPI Reductions:       7.250e+01      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                            and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.9730e+01 100.0%  3.2281e+09 100.0%  6.213e+03
> 100.0%  1.080e+03      100.0%  2.120e+02  73.1%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>   Mess: number of messages sent
>   Avg. len: average message length
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>      %T - percent time in this phase         %F - percent flops in this
> phase
>      %M - percent messages in this phase     %L - percent message lengths
> in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>      ##########################################################
>      #                                                        #
>      #                          WARNING!!!                    #
>      #                                                        #
>      #   This code was compiled with a debugging option,      #
>      #   To get timing results run config/configure.py        #
>      #   using --with-debugging=no, the performance will      #
>      #   be generally two or three times faster.              #
>      #                                                        #
>      ##########################################################
>
>
> Event                Count      Time (sec)     Flops
>      --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecSet                 5 1.0 1.2703e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin       3 1.0 2.9233e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 9.0e+00  0  0  0  0  3   0  0  0  0  4     0
> VecAssemblyEnd         3 1.0 2.2650e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     1003 1.0 1.8717e-01 4.1 0.00e+00 0.0 6.0e+03 1.1e+03
> 0.0e+00  1  0 97 95  0   1  0 97 95  0     0
> VecScatterEnd       1003 1.0 5.3403e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 20  0  0  0  0  20  0  0  0  0     0
> MatMult             1000 1.0 1.5877e+01 1.0 8.08e+08 1.0 6.0e+03 1.1e+03
> 0.0e+00 80100 97 95  0  80100 97 95  0   203
> MatAssemblyBegin       7 1.0 3.6728e-01 1.9 0.00e+00 0.0 6.3e+01 5.0e+03
> 1.4e+01  1  0  1  5  5   1  0  1  5  7     0
> MatAssemblyEnd         7 1.0 8.6817e-01 1.2 0.00e+00 0.0 8.4e+01 2.7e+02
> 7.0e+01  4  0  1  0 24   4  0  1  0 33     0
> MatZeroEntries         7 1.0 5.7693e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>   Application Order     2              0          0     0
>           Index Set    30             30      18476     0
>   IS L to G Mapping    10              0          0     0
>                 Vec    30              7       9128     0
>         Vec Scatter    15              0          0     0
>              Matrix    21              0          0     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 2.14577e-07
> Average time for MPI_Barrier(): 5.89848e-05
> Average time for zero size MPI_Send(): 6.80089e-05
> #PETSc Option Table entries:
> -log_summary output1
> #End o PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Fri Jun 12 16:59:30 2009
> Configure options: --with-cc="gcc -fPIC" --download-mpich=1
> --download-f-blas-lapack --download-triangle --download-parmetis
> --with-hypre=1 --download-hypre=1 --with-shared=0
> -----------------------------------------
> Libraries compiled on Fri Jun 12 17:11:54 BST 2009 on christian-desktop
> Machine characteristics: Linux christian-desktop 2.6.27-7-generic #1 SMP
> Fri Oct 24 06:40:41 UTC 2008 x86_64 GNU/Linux
> Using PETSc directory: /home/christian/Desktop/petsc-3.0.0-p4
> Using PETSc arch: linux-gnu-c-debug
> -----------------------------------------
> Using C compiler:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -g3
> Using Fortran compiler:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
> -Wno-unused-variable -g
> -----------------------------------------
> Using include paths:
> -I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
> -I/home/christian/Desktop/petsc-3.0.0-p4/include
> -I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
> ------------------------------------------
> Using C linker:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -g3
> Using Fortran linker:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
> -Wno-unused-variable -g
> Using libraries:
> -Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
> -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
> -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -ltriangle
> -lparmetis -lmetis -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl
> -lrt -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
> -L/usr/lib/gcc/x86_64-linux-gnu/4.3.2 -L/lib -ldl -lmpich -lpthread -lrt
> -lgcc_s -lmpichf90 -lgfortranbegin -lgfortran -lm
> -L/usr/lib/gcc/x86_64-linux-gnu -lm -lmpichcxx -lstdc++ -lmpichcxx
> -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
> ------------------------------------------
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./ex4 on a linux-gnu named christian-desktop with 8 processors, by
> christian Sun Jun 14 17:13:40 2009
> Using Petsc Release Version 3.0.0, Patch 4, Fri Mar  6 14:46:08 CST 2009
>
>                         Max       Max/Min        Avg      Total
> Time (sec):           1.452e+01      1.01190   1.443e+01
> Objects:              1.080e+02      1.00000   1.080e+02
> Flops:                3.739e+08      1.00373   3.731e+08  2.985e+09
> Flops/sec:            2.599e+07      1.01190   2.585e+07  2.068e+08
> Memory:               5.157e+07      1.01231              4.117e+08
> MPI Messages:         2.071e+03      2.00000   1.812e+03  1.450e+04
> MPI Message Lengths:  2.388e+06      2.00000   1.153e+03  1.672e+07
> MPI Reductions:       3.625e+01      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                            and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.4431e+01 100.0%  2.9847e+09 100.0%  1.450e+04
> 100.0%  1.153e+03      100.0%  2.120e+02  73.1%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>   Mess: number of messages sent
>   Avg. len: average message length
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>      %T - percent time in this phase         %F - percent flops in this
> phase
>      %M - percent messages in this phase     %L - percent message lengths
> in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>      ##########################################################
>      #                                                        #
>      #                          WARNING!!!                    #
>      #                                                        #
>      #   This code was compiled with a debugging option,      #
>      #   To get timing results run config/configure.py        #
>      #   using --with-debugging=no, the performance will      #
>      #   be generally two or three times faster.              #
>      #                                                        #
>      ##########################################################
>
>
> Event                Count      Time (sec)     Flops
>      --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecSet                 5 1.0 6.1178e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin       3 1.0 7.7400e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 9.0e+00  0  0  0  0  3   0  0  0  0  4     0
> VecAssemblyEnd         3 1.0 4.1008e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     1003 1.0 1.0858e-01 2.9 0.00e+00 0.0 1.4e+04 1.1e+03
> 0.0e+00  1  0 97 95  0   1  0 97 95  0     0
> VecScatterEnd       1003 1.0 5.3962e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 33  0  0  0  0  33  0  0  0  0     0
> MatMult             1000 1.0 1.1430e+01 1.0 3.74e+08 1.0 1.4e+04 1.1e+03
> 0.0e+00 79100 97 95  0  79100 97 95  0   261
> MatAssemblyBegin       7 1.0 4.6307e-01 1.8 0.00e+00 0.0 1.5e+02 5.3e+03
> 1.4e+01  3  0  1  5  5   3  0  1  5  7     0
> MatAssemblyEnd         7 1.0 6.9013e-01 1.3 0.00e+00 0.0 2.0e+02 2.8e+02
> 7.0e+01  4  0  1  0 24   4  0  1  0 33     0
> MatZeroEntries         7 1.0 2.7971e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>   Application Order     2              0          0     0
>           Index Set    30             30      18476     0
>   IS L to G Mapping    10              0          0     0
>                 Vec    30              7       9128     0
>         Vec Scatter    15              0          0     0
>              Matrix    21              0          0     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 0.000419807
> Average time for zero size MPI_Send(): 0.000115991
> #PETSc Option Table entries:
> -log_summary output18
> #End o PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Fri Jun 12 16:59:30 2009
> Configure options: --with-cc="gcc -fPIC" --download-mpich=1
> --download-f-blas-lapack --download-triangle --download-parmetis
> --with-hypre=1 --download-hypre=1 --with-shared=0
> -----------------------------------------
> Libraries compiled on Fri Jun 12 17:11:54 BST 2009 on christian-desktop
> Machine characteristics: Linux christian-desktop 2.6.27-7-generic #1 SMP
> Fri Oct 24 06:40:41 UTC 2008 x86_64 GNU/Linux
> Using PETSc directory: /home/christian/Desktop/petsc-3.0.0-p4
> Using PETSc arch: linux-gnu-c-debug
> -----------------------------------------
> Using C compiler:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -g3
> Using Fortran compiler:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
> -Wno-unused-variable -g
> -----------------------------------------
> Using include paths:
> -I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
> -I/home/christian/Desktop/petsc-3.0.0-p4/include
> -I/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/include
> ------------------------------------------
> Using C linker:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpicc -Wall
> -Wwrite-strings -Wno-strict-aliasing -g3
> Using Fortran linker:
> /home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/bin/mpif90 -Wall
> -Wno-unused-variable -g
> Using libraries:
> -Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
> -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
> -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib -ltriangle
> -lparmetis -lmetis -lHYPRE -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl
> -lrt -L/home/christian/Desktop/petsc-3.0.0-p4/linux-gnu-c-debug/lib
> -L/usr/lib/gcc/x86_64-linux-gnu/4.3.2 -L/lib -ldl -lmpich -lpthread -lrt
> -lgcc_s -lmpichf90 -lgfortranbegin -lgfortran -lm
> -L/usr/lib/gcc/x86_64-linux-gnu -lm -lmpichcxx -lstdc++ -lmpichcxx
> -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl
> ------------------------------------------
>
>
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090614/92e81f63/attachment-0001.htm>


More information about the petsc-users mailing list