Analysis of performance of parallel code as processors increase
Barry Smith
bsmith at mcs.anl.gov
Tue Jun 10 15:37:47 CDT 2008
On Jun 10, 2008, at 9:21 AM, Ben Tay wrote:
> Hi Barry,
>
> I found that when I use hypre, it is about twice as slow. I guess
> hypre does not work well with the linearised momentum eqn. I tried
> to use PCILU and PCICC and I got the error:
hypre has several preconditioners; run with -pc_type hypre -help to
get a list you set the type of hypre preconditioner with -
pc_hypre_type then one of "pilut","parasails","boomeramg","euclid"
>
>
> No support for this operation for this object type!
> [1]PETSC ERROR: Matrix type mpiaij symbolic ICC!
PETSc does not have its own parallel ICC or ILU
>
>
> PCASM performs even worse. It seems like block jacobi is still the
> best. where did you find the no. of iterations? Are you saying that
> if I increase the no. of processors, the iteration nos must go down?
With block Jacobi and ASM the number of iterations will INCREASE
with more processes. Depending on the problem they may increase a tiny
bit
or they may increase A LOT.
>
>
> Btw, I'm using the richardson solver. Other combi such as bcgs +
> hypre is much worse.
Adding any Krylov method like bcgs should pretty much always
decrease the number of iterations needed and usually decrease the time
over Richardson.
>
>
> Does it mean there are some other problems present and hence my code
> does not scale properly?
There is no way to tell this.
>
>
> Thank you very much.
>
> Regards
>
>
> Barry Smith wrote:
>>
>> You are not using hypre, you are using block Jacobi with ILU on
>> the blocks.
>>
>> The number of iterations goes from around 4000 to around 5000 in
>> going from 4 to 8 processes,
>> this is why you do not see such a great speedup.
>>
>> Barry
>>
>> On Jun 6, 2008, at 8:07 PM, Ben Tay wrote:
>>
>>> Hi,
>>>
>>> I have coded in parallel using PETSc and Hypre. I found that going
>>> from 1 to 4 processors gives an almost 4 times increase. However
>>> from 4 to 8 processors only increase performance by 1.2-1.5
>>> instead of 2.
>>>
>>> Is the slowdown due to the size of the matrix being not large
>>> enough? Currently I am using 600x2160 to do the benchmark. Even
>>> when increase the matrix size to 900x3240 or 1200x2160, the
>>> performance increase is also not much. Is it possible to use -
>>> log_summary find out the error? I have attached the log file
>>> comparison for the 4 and 8 processors, I found that some event
>>> like VecScatterEnd, VecNorm and MatAssemblyBegin have much higher
>>> ratios. Does it indicate something? Another strange thing is that
>>> MatAssemblyBegin for the 4 pros has a much higher ratio than the
>>> 8pros. I thought there should be less communications for the 4
>>> pros case, and so the ratio should be lower. Does it mean there's
>>> some communication problem at that time?
>>>
>>> Thank you very much.
>>>
>>> Regards
>>>
>>>
>>> ************************************************************************************************************************
>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use
>>> 'enscript -r -fCourier9' to print this document ***
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> ./a.out on a atlas3-mp named atlas3-c43 with 4 processors, by
>>> g0306332 Fri Jun 6 17:29:26 2008
>>> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40
>>> CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>>>
>>> Max Max/Min Avg Total
>>> Time (sec): 1.750e+03 1.00043 1.750e+03
>>> Objects: 4.200e+01 1.00000 4.200e+01
>>> Flops: 6.961e+10 1.00074 6.959e+10 2.784e+11
>>> Flops/sec: 3.980e+07 1.00117 3.978e+07 1.591e+08
>>> MPI Messages: 8.168e+03 2.00000 6.126e+03 2.450e+04
>>> MPI Message Lengths: 5.525e+07 2.00000 6.764e+03 1.658e+08
>>> MPI Reductions: 3.203e+03 1.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>> e.g., VecAXPY() for real vectors of
>>> length N --> 2N flops
>>> and VecAXPY() for complex vectors of
>>> length N --> 8N flops
>>>
>>> Summary of Stages: ----- Time ------ ----- Flops ----- ---
>>> Messages --- -- Message Lengths -- -- Reductions --
>>> Avg %Total Avg %Total counts
>>> %Total Avg %Total counts %Total
>>> 0: Main Stage: 1.7495e+03 100.0% 2.7837e+11 100.0% 2.450e
>>> +04 100.0% 6.764e+03 100.0% 1.281e+04 100.0%
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>> Count: number of times phase was executed
>>> Time and Flops/sec: Max - maximum over all processors
>>> Ratio - ratio of maximum to minimum over all
>>> processors
>>> Mess: number of messages sent
>>> Avg. len: average message length
>>> Reduct: number of global reductions
>>> Global: entire computation
>>> Stage: stages of a computation. Set stages with
>>> PetscLogStagePush() and PetscLogStagePop().
>>> %T - percent time in this phase %F - percent flops in
>>> this phase
>>> %M - percent messages in this phase %L - percent message
>>> lengths in this phase
>>> %R - percent reductions in this phase
>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
>>> time over all processors)
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> ##########################################################
>>> # #
>>> # WARNING!!! #
>>> # #
>>> # This code was run without the PreLoadBegin() #
>>> # macros. To get timing results we always recommend #
>>> # preloading. otherwise timing numbers may be #
>>> # meaningless. #
>>> ##########################################################
>>>
>>>
>>> Event Count Time (sec) Flops/
>>> sec --- Global --- --- Stage --- Total
>>> Max Ratio Max Ratio Max Ratio Mess Avg
>>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> MatMult 4082 1.0 8.2037e+01 1.5 4.67e+08 1.5 2.4e+04
>>> 6.8e+03 0.0e+00 4 37100100 0 4 37100100 0 1240
>>> MatSolve 1976 1.0 1.3250e+02 1.5 2.52e+08 1.5 0.0e+00
>>> 0.0e+00 0.0e+00 6 31 0 0 0 6 31 0 0 0 655
>>> MatLUFactorNum 300 1.0 3.8260e+01 1.2 2.07e+08 1.2 0.0e+00
>>> 0.0e+00 0.0e+00 2 9 0 0 0 2 9 0 0 0 668
>>> MatILUFactorSym 1 1.0 2.2550e-01 2.7 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatConvert 1 1.0 2.9182e-01 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatAssemblyBegin 301 1.0 1.0776e+021228.9 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 6.0e+02 4 0 0 0 5 4 0 0 0 5 0
>>> MatAssemblyEnd 301 1.0 9.6146e+00 1.1 0.00e+00 0.0 1.2e+01
>>> 3.6e+03 3.1e+02 1 0 0 0 2 1 0 0 0 2 0
>>> MatGetRow 324000 1.0 1.2161e-01 1.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetRowIJ 3 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetOrdering 1 1.0 2.1279e-02 2.3 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSetup 601 1.0 2.5108e-02 1.2 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSolve 600 1.0 1.2353e+03 1.0 5.64e+07 1.0 2.4e+04
>>> 6.8e+03 8.3e+03 71100100100 65 71100100100 65 225
>>> PCSetUp 601 1.0 4.0116e+01 1.2 1.96e+08 1.2 0.0e+00
>>> 0.0e+00 5.0e+00 2 9 0 0 0 2 9 0 0 0 637
>>> PCSetUpOnBlocks 300 1.0 3.8513e+01 1.2 2.06e+08 1.2 0.0e+00
>>> 0.0e+00 3.0e+00 2 9 0 0 0 2 9 0 0 0 664
>>> PCApply 4682 1.0 1.0566e+03 1.0 2.12e+07 1.0 0.0e+00
>>> 0.0e+00 0.0e+00 59 31 0 0 0 59 31 0 0 0 82
>>> VecDot 4812 1.0 8.2762e+00 1.1 4.00e+08 1.1 0.0e+00
>>> 0.0e+00 4.8e+03 0 4 0 0 38 0 4 0 0 38 1507
>>> VecNorm 3479 1.0 9.2739e+01 8.3 3.15e+08 8.3 0.0e+00
>>> 0.0e+00 3.5e+03 4 5 0 0 27 4 5 0 0 27 152
>>> VecCopy 900 1.0 2.0819e+00 1.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecSet 5882 1.0 9.4626e+00 1.5 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecAXPY 5585 1.0 1.5397e+01 1.5 4.67e+08 1.5 0.0e+00
>>> 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 1273
>>> VecAYPX 2879 1.0 1.0303e+01 1.6 4.45e+08 1.6 0.0e+00
>>> 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1146
>>> VecWAXPY 2406 1.0 7.7902e+00 1.6 3.14e+08 1.6 0.0e+00
>>> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 801
>>> VecAssemblyBegin 1200 1.0 8.4259e+00 3.8 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 3.6e+03 0 0 0 0 28 0 0 0 0 28 0
>>> VecAssemblyEnd 1200 1.0 2.4173e-03 1.3 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecScatterBegin 4082 1.0 1.2512e-01 1.5 0.00e+00 0.0 2.4e+04
>>> 6.8e+03 0.0e+00 0 0100100 0 0 0100100 0 0
>>> VecScatterEnd 4082 1.0 2.0954e+0153.3 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type Creations Destructions Memory
>>> Descendants' Mem.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> Matrix 7 7 321241092 0
>>> Krylov Solver 3 3 8 0
>>> Preconditioner 3 3 528 0
>>> Index Set 7 7 7785600 0
>>> Vec 20 20 46685344 0
>>> Vec Scatter 2 2 0 0
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> ====================================================================
>>> Average time to get PetscTime(): 1.90735e-07
>>> Average time for MPI_Barrier(): 1.45912e-05
>>> Average time for zero size MPI_Send(): 7.27177e-06
>>> OptionTable: -log_summary test4_600
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8
>>> Configure run at: Tue Jan 8 22:22:08 2008
>>> Configure options: --with-memcmp-ok --sizeof_char=1 --
>>> sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --
>>> sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --
>>> bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-
>>> vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/
>>> g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-
>>> shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --
>>> with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-
>>> mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-
>>> dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
>>> -----------------------------------------
>>> Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
>>> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP
>>> Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
>>> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
>>> Using PETSc arch: atlas3-mpi
>>> -----------------------------------------
>>> Using C compiler: mpicc -fPIC -O
>>> Using Fortran compiler: mpif90 -I. -fPIC -O
>>> -----------------------------------------
>>> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 -
>>> I/nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi -I/nfs/
>>> home/enduser/g0306332/petsc-2.3.3-p8/include -I/home/enduser/
>>> g0306332/lib/hypre/include -I/usr/local/topspin/mpi/mpich/include
>>> ------------------------------------------
>>> Using C linker: mpicc -fPIC -O
>>> Using Fortran linker: mpif90 -I. -fPIC -O
>>> Using libraries: -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-
>>> p8/lib/atlas3-mpi -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/
>>> atlas3-mpi -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -
>>> lpetscvec -lpetsc -Wl,-rpath,/home/enduser/g0306332/lib/
>>> hypre/lib -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE -Wl,-
>>> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64
>>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/
>>> x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -
>>> lcxaguard -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
>>> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
>>> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
>>> rpath,/usr/local/topspin/mpi/mpich/lib -L/usr/local/topspin/mpi/
>>> mpich/lib -lmpich -Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t -L/
>>> opt/intel/cmkl/8.1.1/lib/em64t -lmkl_lapack -lmkl_em64t -lguide -
>>> lpthread -Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -
>>> Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/
>>> lib -ldl -lmpich -libverbs -libumad -lpthread -lrt -Wl,-rpath,/opt/
>>> intel/cce/9.1.049/lib -L/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/
>>> lib/gcc/x86_64-redhat-linux/3.4.6/ -L/usr/lib/gcc/x86_64-redhat-
>>> linux/3.4.6/ -Wl,-rpath,/usr/lib64 -L/usr/lib64 -lsvml -limf -
>>> lipgo -lirc -lgcc_s -lirc_s -lmpichf90nc -Wl,-rpath,/opt/mvapich/
>>> 0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-rpath,/opt/
>>> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
>>> 3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/intel/fce/9.1.045/lib
>>> -L/opt/intel/fce/9.1.045/lib -lifport -lifcore -lm -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
>>> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
>>> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lm -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
>>> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
>>> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
>>> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64
>>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/
>>> x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
>>> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
>>> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
>>> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64
>>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/
>>> x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -
>>> Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs -
>>> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/
>>> opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-
>>> linux/3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/
>>> usr/lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -
>>> ldl -lc
>>> ------------------------------------------
>>> ************************************************************************************************************************
>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use
>>> 'enscript -r -fCourier9' to print this document ***
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> ./a.out on a atlas3-mp named atlas3-c18 with 8 processors, by
>>> g0306332 Fri Jun 6 17:23:25 2008
>>> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40
>>> CST 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>>>
>>> Max Max/Min Avg Total
>>> Time (sec): 1.140e+03 1.00019 1.140e+03
>>> Objects: 4.200e+01 1.00000 4.200e+01
>>> Flops: 4.620e+10 1.00158 4.619e+10 3.695e+11
>>> Flops/sec: 4.053e+07 1.00177 4.051e+07 3.241e+08
>>> MPI Messages: 9.954e+03 2.00000 8.710e+03 6.968e+04
>>> MPI Message Lengths: 7.224e+07 2.00000 7.257e+03 5.057e+08
>>> MPI Reductions: 1.716e+03 1.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>> e.g., VecAXPY() for real vectors of
>>> length N --> 2N flops
>>> and VecAXPY() for complex vectors of
>>> length N --> 8N flops
>>>
>>> Summary of Stages: ----- Time ------ ----- Flops ----- ---
>>> Messages --- -- Message Lengths -- -- Reductions --
>>> Avg %Total Avg %Total counts
>>> %Total Avg %Total counts %Total
>>> 0: Main Stage: 1.1402e+03 100.0% 3.6953e+11 100.0% 6.968e
>>> +04 100.0% 7.257e+03 100.0% 1.372e+04 100.0%
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>> Count: number of times phase was executed
>>> Time and Flops/sec: Max - maximum over all processors
>>> Ratio - ratio of maximum to minimum over all
>>> processors
>>> Mess: number of messages sent
>>> Avg. len: average message length
>>> Reduct: number of global reductions
>>> Global: entire computation
>>> Stage: stages of a computation. Set stages with
>>> PetscLogStagePush() and PetscLogStagePop().
>>> %T - percent time in this phase %F - percent flops in
>>> this phase
>>> %M - percent messages in this phase %L - percent message
>>> lengths in this phase
>>> %R - percent reductions in this phase
>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
>>> time over all processors)
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> ##########################################################
>>> # #
>>> # WARNING!!! #
>>> # #
>>> # This code was run without the PreLoadBegin() #
>>> # macros. To get timing results we always recommend #
>>> # preloading. otherwise timing numbers may be #
>>> # meaningless. #
>>> ##########################################################
>>>
>>>
>>> Event Count Time (sec) Flops/
>>> sec --- Global --- --- Stage --- Total
>>> Max Ratio Max Ratio Max Ratio Mess Avg
>>> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> MatMult 4975 1.0 7.8154e+01 1.9 4.19e+08 1.9 7.0e+04
>>> 7.3e+03 0.0e+00 5 38100100 0 5 38100100 0 1798
>>> MatSolve 2855 1.0 1.0870e+02 1.8 2.57e+08 1.8 0.0e+00
>>> 0.0e+00 0.0e+00 7 34 0 0 0 7 34 0 0 0 1153
>>> MatLUFactorNum 300 1.0 2.3238e+01 1.5 2.07e+08 1.5 0.0e+00
>>> 0.0e+00 0.0e+00 2 7 0 0 0 2 7 0 0 0 1099
>>> MatILUFactorSym 1 1.0 6.1973e-02 1.5 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatConvert 1 1.0 1.4168e-01 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatAssemblyBegin 301 1.0 6.9683e+01 8.6 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 6.0e+02 4 0 0 0 4 4 0 0 0 4 0
>>> MatAssemblyEnd 301 1.0 6.2247e+00 1.2 0.00e+00 0.0 2.8e+01
>>> 3.6e+03 3.1e+02 0 0 0 0 2 0 0 0 0 2 0
>>> MatGetRow 162000 1.0 6.0330e-02 1.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetRowIJ 3 1.0 9.0599e-06 3.2 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatGetOrdering 1 1.0 5.6710e-03 1.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSetup 601 1.0 1.5631e-02 1.1 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSolve 600 1.0 8.1668e+02 1.0 5.66e+07 1.0 7.0e+04
>>> 7.3e+03 9.2e+03 72100100100 67 72100100100 67 452
>>> PCSetUp 601 1.0 2.4372e+01 1.5 1.93e+08 1.5 0.0e+00
>>> 0.0e+00 5.0e+00 2 7 0 0 0 2 7 0 0 0 1048
>>> PCSetUpOnBlocks 300 1.0 2.3303e+01 1.5 2.07e+08 1.5 0.0e+00
>>> 0.0e+00 3.0e+00 2 7 0 0 0 2 7 0 0 0 1096
>>> PCApply 5575 1.0 6.5344e+02 1.1 2.57e+07 1.1 0.0e+00
>>> 0.0e+00 0.0e+00 55 34 0 0 0 55 34 0 0 0 192
>>> VecDot 4840 1.0 6.8932e+00 1.3 3.07e+08 1.3 0.0e+00
>>> 0.0e+00 4.8e+03 1 3 0 0 35 1 3 0 0 35 1820
>>> VecNorm 4365 1.0 1.2250e+02 3.6 6.82e+07 3.6 0.0e+00
>>> 0.0e+00 4.4e+03 8 5 0 0 32 8 5 0 0 32 153
>>> VecCopy 900 1.0 1.4297e+00 1.8 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecSet 6775 1.0 8.1405e+00 1.8 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecAXPY 6485 1.0 1.0003e+01 1.9 5.73e+08 1.9 0.0e+00
>>> 0.0e+00 0.0e+00 1 7 0 0 0 1 7 0 0 0 2420
>>> VecAYPX 3765 1.0 7.8289e+00 2.0 5.17e+08 2.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 2092
>>> VecWAXPY 2420 1.0 3.8504e+00 1.9 3.80e+08 1.9 0.0e+00
>>> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1629
>>> VecAssemblyBegin 1200 1.0 9.2808e+00 3.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 3.6e+03 1 0 0 0 26 1 0 0 0 26 0
>>> VecAssemblyEnd 1200 1.0 2.3313e-03 1.3 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecScatterBegin 4975 1.0 2.2727e-01 2.6 0.00e+00 0.0 7.0e+04
>>> 7.3e+03 0.0e+00 0 0100100 0 0 0100100 0 0
>>> VecScatterEnd 4975 1.0 2.7557e+0168.1 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type Creations Destructions Memory
>>> Descendants' Mem.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> Matrix 7 7 160595412 0
>>> Krylov Solver 3 3 8 0
>>> Preconditioner 3 3 528 0
>>> Index Set 7 7 3897600 0
>>> Vec 20 20 23357344 0
>>> Vec Scatter 2 2 0 0
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> =
>>> ====================================================================
>>> Average time to get PetscTime(): 1.19209e-07
>>> Average time for MPI_Barrier(): 2.10285e-05
>>> Average time for zero size MPI_Send(): 7.59959e-06
>>> OptionTable: -log_summary test8_600
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8
>>> Configure run at: Tue Jan 8 22:22:08 2008
>>> Configure options: --with-memcmp-ok --sizeof_char=1 --
>>> sizeof_void_p=8 --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --
>>> sizeof_long_long=8 --sizeof_float=4 --sizeof_double=8 --
>>> bits_per_byte=8 --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-
>>> vendor-compilers=intel --with-x=0 --with-hypre-dir=/home/enduser/
>>> g0306332/lib/hypre --with-debugging=0 --with-batch=1 --with-mpi-
>>> shared=0 --with-mpi-include=/usr/local/topspin/mpi/mpich/include --
>>> with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a --with-
>>> mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun --with-blas-lapack-
>>> dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
>>> -----------------------------------------
>>> Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
>>> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP
>>> Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
>>> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
>>> Using PETSc arch: atlas3-mpi
>>> -----------------------------------------
>>> Using C compiler: mpicc -fPIC -O
>>> Using Fortran compiler: mpif90 -I. -fPIC -O
>>> -----------------------------------------
>>> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 -
>>> I/nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi -I/nfs/
>>> home/enduser/g0306332/petsc-2.3.3-p8/include -I/home/enduser/
>>> g0306332/lib/hypre/include -I/usr/local/topspin/mpi/mpich/include
>>> ------------------------------------------
>>> Using C linker: mpicc -fPIC -O
>>> Using Fortran linker: mpif90 -I. -fPIC -O
>>> Using libraries: -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-
>>> p8/lib/atlas3-mpi -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/
>>> atlas3-mpi -lpetscts -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -
>>> lpetscvec -lpetsc -Wl,-rpath,/home/enduser/g0306332/lib/
>>> hypre/lib -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE -Wl,-
>>> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64
>>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/
>>> x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -
>>> lcxaguard -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/
>>> local/ofed/lib64 -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/
>>> usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-
>>> rpath,/usr/local/topspin/mpi/mpich/lib -L/usr/local/topspin/mpi/
>>> mpich/lib -lmpich -Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t -L/
>>> opt/intel/cmkl/8.1.1/lib/em64t -lmkl_lapack -lmkl_em64t -lguide -
>>> lpthread -Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -
>>> Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/
>>> lib -ldl -lmpich -libverbs -libumad -lpthread -lrt -Wl,-rpath,/opt/
>>> intel/cce/9.1.049/lib -L/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/
>>> lib/gcc/x86_64-redhat-linux/3.4.6/ -L/usr/lib/gcc/x86_64-redhat-
>>> linux/3.4.6/ -Wl,-rpath,/usr/lib64 -L/usr/lib64 -lsvml -limf -
>>> lipgo -lirc -lgcc_s -lirc_s -lmpichf90nc -Wl,-rpath,/opt/mvapich/
>>> 0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-rpath,/opt/
>>> intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/
>>> 3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/intel/fce/9.1.045/lib
>>> -L/opt/intel/fce/9.1.045/lib -lifport -lifcore -lm -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
>>> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
>>> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lm -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
>>> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
>>> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
>>> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64
>>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/
>>> x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64 -Wl,-
>>> rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-
>>> redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard -Wl,-
>>> rpath,/opt/mvapich/0.9.9/gen2/lib -Wl,-rpath,/usr/local/ofed/lib64
>>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/
>>> x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/
>>> mvapich/0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich -
>>> Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs -
>>> libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/
>>> opt/intel/cce/9.1.049/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-
>>> linux/3.4.6/ -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/
>>> usr/lib64 -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -
>>> ldl -lc
>>> ------------------------------------------
>>
>>
>
>
More information about the petsc-users
mailing list