Analysis of performance of parallel code as processors increase
Barry Smith
bsmith at mcs.anl.gov
Fri Jun 6 21:23:15 CDT 2008
You are not using hypre, you are using block Jacobi with ILU on
the blocks.
The number of iterations goes from around 4000 to around 5000 in
going from 4 to 8 processes,
this is why you do not see such a great speedup.
Barry
On Jun 6, 2008, at 8:07 PM, Ben Tay wrote:
> Hi,
>
> I have coded in parallel using PETSc and Hypre. I found that going
> from 1 to 4 processors gives an almost 4 times increase. However
> from 4 to 8 processors only increase performance by 1.2-1.5 instead
> of 2.
>
> Is the slowdown due to the size of the matrix being not large
> enough? Currently I am using 600x2160 to do the benchmark. Even when
> increase the matrix size to 900x3240 or 1200x2160, the performance
> increase is also not much. Is it possible to use -log_summary find
> out the error? I have attached the log file comparison for the 4 and
> 8 processors, I found that some event like VecScatterEnd, VecNorm
> and MatAssemblyBegin have much higher ratios. Does it indicate
> something? Another strange thing is that MatAssemblyBegin for the 4
> pros has a much higher ratio than the 8pros. I thought there should
> be less communications for the 4 pros case, and so the ratio should
> be lower. Does it mean there's some communication problem at that
> time?
>
> Thank you very much.
>
> Regards
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -
> r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./a.out on a atlas3-mp named atlas3-c43 with 4 processors, by
> g0306332 Fri Jun 6 17:29:26 2008
> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
> 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>
> Max Max/Min Avg Total
> Time (sec): 1.750e+03 1.00043 1.750e+03
> Objects: 4.200e+01 1.00000 4.200e+01
> Flops: 6.961e+10 1.00074 6.959e+10 2.784e+11
> Flops/sec: 3.980e+07 1.00117 3.978e+07 1.591e+08
> MPI Messages: 8.168e+03 2.00000 6.126e+03 2.450e+04
> MPI Message Lengths: 5.525e+07 2.00000 6.764e+03 1.658e+08
> MPI Reductions: 3.203e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of
> length N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.7495e+03 100.0% 2.7837e+11 100.0% 2.450e+04
> 100.0% 6.764e+03 100.0% 1.281e+04 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops/sec: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with
> PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
> time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was run without the PreLoadBegin() #
> # macros. To get timing results we always recommend #
> # preloading. otherwise timing numbers may be #
> # meaningless. #
> ##########################################################
>
>
> Event Count Time (sec) Flops/
> sec --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 4082 1.0 8.2037e+01 1.5 4.67e+08 1.5 2.4e+04 6.8e
> +03 0.0e+00 4 37100100 0 4 37100100 0 1240
> MatSolve 1976 1.0 1.3250e+02 1.5 2.52e+08 1.5 0.0e+00 0.0e
> +00 0.0e+00 6 31 0 0 0 6 31 0 0 0 655
> MatLUFactorNum 300 1.0 3.8260e+01 1.2 2.07e+08 1.2 0.0e+00 0.0e
> +00 0.0e+00 2 9 0 0 0 2 9 0 0 0 668
> MatILUFactorSym 1 1.0 2.2550e-01 2.7 0.00e+00 0.0 0.0e+00 0.0e
> +00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatConvert 1 1.0 2.9182e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 301 1.0 1.0776e+021228.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 6.0e+02 4 0 0 0 5 4 0 0 0 5 0
> MatAssemblyEnd 301 1.0 9.6146e+00 1.1 0.00e+00 0.0 1.2e+01 3.6e
> +03 3.1e+02 1 0 0 0 2 1 0 0 0 2 0
> MatGetRow 324000 1.0 1.2161e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 3 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 2.1279e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetup 601 1.0 2.5108e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 600 1.0 1.2353e+03 1.0 5.64e+07 1.0 2.4e+04 6.8e
> +03 8.3e+03 71100100100 65 71100100100 65 225
> PCSetUp 601 1.0 4.0116e+01 1.2 1.96e+08 1.2 0.0e+00 0.0e
> +00 5.0e+00 2 9 0 0 0 2 9 0 0 0 637
> PCSetUpOnBlocks 300 1.0 3.8513e+01 1.2 2.06e+08 1.2 0.0e+00 0.0e
> +00 3.0e+00 2 9 0 0 0 2 9 0 0 0 664
> PCApply 4682 1.0 1.0566e+03 1.0 2.12e+07 1.0 0.0e+00 0.0e
> +00 0.0e+00 59 31 0 0 0 59 31 0 0 0 82
> VecDot 4812 1.0 8.2762e+00 1.1 4.00e+08 1.1 0.0e+00 0.0e
> +00 4.8e+03 0 4 0 0 38 0 4 0 0 38 1507
> VecNorm 3479 1.0 9.2739e+01 8.3 3.15e+08 8.3 0.0e+00 0.0e
> +00 3.5e+03 4 5 0 0 27 4 5 0 0 27 152
> VecCopy 900 1.0 2.0819e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 5882 1.0 9.4626e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 5585 1.0 1.5397e+01 1.5 4.67e+08 1.5 0.0e+00 0.0e
> +00 0.0e+00 1 7 0 0 0 1 7 0 0 0 1273
> VecAYPX 2879 1.0 1.0303e+01 1.6 4.45e+08 1.6 0.0e+00 0.0e
> +00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1146
> VecWAXPY 2406 1.0 7.7902e+00 1.6 3.14e+08 1.6 0.0e+00 0.0e
> +00 0.0e+00 0 2 0 0 0 0 2 0 0 0 801
> VecAssemblyBegin 1200 1.0 8.4259e+00 3.8 0.00e+00 0.0 0.0e+00 0.0e
> +00 3.6e+03 0 0 0 0 28 0 0 0 0 28 0
> VecAssemblyEnd 1200 1.0 2.4173e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 4082 1.0 1.2512e-01 1.5 0.00e+00 0.0 2.4e+04 6.8e
> +03 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 4082 1.0 2.0954e+0153.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> --- Event Stage 0: Main Stage
>
> Matrix 7 7 321241092 0
> Krylov Solver 3 3 8 0
> Preconditioner 3 3 528 0
> Index Set 7 7 7785600 0
> Vec 20 20 46685344 0
> Vec Scatter 2 2 0 0
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 1.45912e-05
> Average time for zero size MPI_Send(): 7.27177e-06
> OptionTable: -log_summary test4_600
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Jan 8 22:22:08 2008
> Configure options: --with-memcmp-ok --sizeof_char=1 --
> sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --
> sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --
> bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-
> vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/
> g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-
> shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --
> with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-
> mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-
> dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
> -----------------------------------------
> Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed
> Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
> Using PETSc arch: atlas3-mpi
> -----------------------------------------
> Using C compiler: mpicc -fPIC -O
> Using Fortran compiler: mpif90 -I. -fPIC -O
> -----------------------------------------
> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 -I/
> nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi -I/nfs/
> home/enduser/g0306332/petsc-2.3.3-p8/include -I/home/enduser/
> g0306332/lib/hypre/include -I/usr/local/topspin/mpi/mpich/include
> ------------------------------------------
> Using C linker: mpicc -fPIC -O
> Using Fortran linker: mpif90 -I. -fPIC -O
> Using libraries: -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-
> p8/lib/atlas3-mpi -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/
> atlas3-mpi -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -
> lpetscvec -lpetsc -Wl,-rpath,/home/enduser/g0306332/lib/hypre/
> lib -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE -Wl,-rpath,/opt/
> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-rpath,/
> opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-
> linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-rpath,/
> opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/local/
> topspin/mpi/mpich/lib -L/usr/local/topspin/mpi/mpich/lib -lmpich -
> Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t -L/opt/intel/cmkl/8.1.1/
> lib/em64t -lmkl_lapack -lmkl_em64t -lguide -lpthread -Wl,-rpath,/usr/
> local/ofed/lib64 -L/usr/local/ofed/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -
> lmpichf90nc -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/intel/fce/9.1.045/lib -L/opt/intel/fce/9.1.045/lib -
> lifport -lifcore -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-
> rpath,/usr/local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -
> Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc+
> + -lcxaguard -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -Wl,-
> rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -ldl -lc
> ------------------------------------------
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -
> r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./a.out on a atlas3-mp named atlas3-c18 with 8 processors, by
> g0306332 Fri Jun 6 17:23:25 2008
> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
> 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>
> Max Max/Min Avg Total
> Time (sec): 1.140e+03 1.00019 1.140e+03
> Objects: 4.200e+01 1.00000 4.200e+01
> Flops: 4.620e+10 1.00158 4.619e+10 3.695e+11
> Flops/sec: 4.053e+07 1.00177 4.051e+07 3.241e+08
> MPI Messages: 9.954e+03 2.00000 8.710e+03 6.968e+04
> MPI Message Lengths: 7.224e+07 2.00000 7.257e+03 5.057e+08
> MPI Reductions: 1.716e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of
> length N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.1402e+03 100.0% 3.6953e+11 100.0% 6.968e+04
> 100.0% 7.257e+03 100.0% 1.372e+04 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops/sec: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with
> PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in
> this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
> time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was run without the PreLoadBegin() #
> # macros. To get timing results we always recommend #
> # preloading. otherwise timing numbers may be #
> # meaningless. #
> ##########################################################
>
>
> Event Count Time (sec) Flops/
> sec --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 4975 1.0 7.8154e+01 1.9 4.19e+08 1.9 7.0e+04 7.3e
> +03 0.0e+00 5 38100100 0 5 38100100 0 1798
> MatSolve 2855 1.0 1.0870e+02 1.8 2.57e+08 1.8 0.0e+00 0.0e
> +00 0.0e+00 7 34 0 0 0 7 34 0 0 0 1153
> MatLUFactorNum 300 1.0 2.3238e+01 1.5 2.07e+08 1.5 0.0e+00 0.0e
> +00 0.0e+00 2 7 0 0 0 2 7 0 0 0 1099
> MatILUFactorSym 1 1.0 6.1973e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e
> +00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatConvert 1 1.0 1.4168e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 301 1.0 6.9683e+01 8.6 0.00e+00 0.0 0.0e+00 0.0e
> +00 6.0e+02 4 0 0 0 4 4 0 0 0 4 0
> MatAssemblyEnd 301 1.0 6.2247e+00 1.2 0.00e+00 0.0 2.8e+01 3.6e
> +03 3.1e+02 0 0 0 0 2 0 0 0 0 2 0
> MatGetRow 162000 1.0 6.0330e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 3 1.0 9.0599e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 5.6710e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetup 601 1.0 1.5631e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 600 1.0 8.1668e+02 1.0 5.66e+07 1.0 7.0e+04 7.3e
> +03 9.2e+03 72100100100 67 72100100100 67 452
> PCSetUp 601 1.0 2.4372e+01 1.5 1.93e+08 1.5 0.0e+00 0.0e
> +00 5.0e+00 2 7 0 0 0 2 7 0 0 0 1048
> PCSetUpOnBlocks 300 1.0 2.3303e+01 1.5 2.07e+08 1.5 0.0e+00 0.0e
> +00 3.0e+00 2 7 0 0 0 2 7 0 0 0 1096
> PCApply 5575 1.0 6.5344e+02 1.1 2.57e+07 1.1 0.0e+00 0.0e
> +00 0.0e+00 55 34 0 0 0 55 34 0 0 0 192
> VecDot 4840 1.0 6.8932e+00 1.3 3.07e+08 1.3 0.0e+00 0.0e
> +00 4.8e+03 1 3 0 0 35 1 3 0 0 35 1820
> VecNorm 4365 1.0 1.2250e+02 3.6 6.82e+07 3.6 0.0e+00 0.0e
> +00 4.4e+03 8 5 0 0 32 8 5 0 0 32 153
> VecCopy 900 1.0 1.4297e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 6775 1.0 8.1405e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 6485 1.0 1.0003e+01 1.9 5.73e+08 1.9 0.0e+00 0.0e
> +00 0.0e+00 1 7 0 0 0 1 7 0 0 0 2420
> VecAYPX 3765 1.0 7.8289e+00 2.0 5.17e+08 2.0 0.0e+00 0.0e
> +00 0.0e+00 0 4 0 0 0 0 4 0 0 0 2092
> VecWAXPY 2420 1.0 3.8504e+00 1.9 3.80e+08 1.9 0.0e+00 0.0e
> +00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1629
> VecAssemblyBegin 1200 1.0 9.2808e+00 3.4 0.00e+00 0.0 0.0e+00 0.0e
> +00 3.6e+03 1 0 0 0 26 1 0 0 0 26 0
> VecAssemblyEnd 1200 1.0 2.3313e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 4975 1.0 2.2727e-01 2.6 0.00e+00 0.0 7.0e+04 7.3e
> +03 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 4975 1.0 2.7557e+0168.1 0.00e+00 0.0 0.0e+00 0.0e
> +00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> --- Event Stage 0: Main Stage
>
> Matrix 7 7 160595412 0
> Krylov Solver 3 3 8 0
> Preconditioner 3 3 528 0
> Index Set 7 7 3897600 0
> Vec 20 20 23357344 0
> Vec Scatter 2 2 0 0
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> Average time to get PetscTime(): 1.19209e-07
> Average time for MPI_Barrier(): 2.10285e-05
> Average time for zero size MPI_Send(): 7.59959e-06
> OptionTable: -log_summary test8_600
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Tue Jan 8 22:22:08 2008
> Configure options: --with-memcmp-ok --sizeof_char=1 --
> sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --
> sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --
> bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-
> vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/
> g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-
> shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --
> with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-
> mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-
> dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
> -----------------------------------------
> Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed
> Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
> Using PETSc arch: atlas3-mpi
> -----------------------------------------
> Using C compiler: mpicc -fPIC -O
> Using Fortran compiler: mpif90 -I. -fPIC -O
> -----------------------------------------
> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 -I/
> nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi -I/nfs/
> home/enduser/g0306332/petsc-2.3.3-p8/include -I/home/enduser/
> g0306332/lib/hypre/include -I/usr/local/topspin/mpi/mpich/include
> ------------------------------------------
> Using C linker: mpicc -fPIC -O
> Using Fortran linker: mpif90 -I. -fPIC -O
> Using libraries: -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-
> p8/lib/atlas3-mpi -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/
> atlas3-mpi -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -
> lpetscvec -lpetsc -Wl,-rpath,/home/enduser/g0306332/lib/hypre/
> lib -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE -Wl,-rpath,/opt/
> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-rpath,/
> opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-
> linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-rpath,/
> opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/local/
> topspin/mpi/mpich/lib -L/usr/local/topspin/mpi/mpich/lib -lmpich -
> Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t -L/opt/intel/cmkl/8.1.1/
> lib/em64t -lmkl_lapack -lmkl_em64t -lguide -lpthread -Wl,-rpath,/usr/
> local/ofed/lib64 -L/usr/local/ofed/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -
> lmpichf90nc -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/intel/fce/9.1.045/lib -L/opt/intel/fce/9.1.045/lib -
> lifport -lifcore -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-
> rpath,/usr/local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -
> Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc+
> + -lcxaguard -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -
> Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/
> 0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -Wl,-
> rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs -
> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/
> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
> 3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/
> lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -ldl -lc
> ------------------------------------------
More information about the petsc-users
mailing list