Analysis of performance of parallel code as processors increase

Ben Tay zonexo at gmail.com
Tue Jun 10 09:21:17 CDT 2008


Hi Barry,

I found that when I use hypre, it is about twice as slow. I guess hypre 
does not work well with the linearised momentum eqn. I tried to use 
PCILU and PCICC and I got the error:

No support for this operation for this object type!
[1]PETSC ERROR: Matrix type mpiaij  symbolic ICC!

PCASM performs even worse. It seems like block jacobi is still the best. 
where did you find the no. of iterations? Are you saying that if I 
increase the no. of processors, the iteration nos must go down?

Btw, I'm using the richardson solver. Other combi such as bcgs + hypre 
is much worse.

Does it mean there are some other problems present and hence my code 
does not scale properly?

Thank you very much.

Regards


Barry Smith wrote:
>
>    You are not using hypre, you are using block Jacobi with ILU on the 
> blocks.
>
>    The number of iterations goes from around 4000 to around 5000 in 
> going from 4 to 8 processes,
> this is why you do not see such a great speedup.
>
>    Barry
>
> On Jun 6, 2008, at 8:07 PM, Ben Tay wrote:
>
>> Hi,
>>
>> I have coded in parallel using PETSc and Hypre. I found that going 
>> from 1 to 4 processors gives an almost 4 times increase. However from 
>> 4 to 8 processors only increase performance by 1.2-1.5 instead of 2.
>>
>> Is the slowdown due to the size of the matrix being not large enough? 
>> Currently I am using 600x2160 to do the benchmark. Even when increase 
>> the matrix size to 900x3240  or 1200x2160, the performance increase 
>> is also not much. Is it possible to use -log_summary find out the 
>> error? I have attached the log file comparison for the 4 and 8 
>> processors, I found that some event like VecScatterEnd, VecNorm and 
>> MatAssemblyBegin have much higher ratios. Does it indicate something? 
>> Another strange thing is that MatAssemblyBegin for the 4 pros has a 
>> much higher ratio than the 8pros. I thought there should be less 
>> communications for the 4 pros case, and so the ratio should be lower. 
>> Does it mean there's some communication problem at that time?
>>
>> Thank you very much.
>>
>> Regards
>>
>>
>> ************************************************************************************************************************ 
>>
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript 
>> -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************ 
>>
>>
>> ---------------------------------------------- PETSc Performance 
>> Summary: ----------------------------------------------
>>
>> ./a.out on a atlas3-mp named atlas3-c43 with 4 processors, by 
>> g0306332 Fri Jun  6 17:29:26 2008
>> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 
>> 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>>
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           1.750e+03      1.00043   1.750e+03
>> Objects:              4.200e+01      1.00000   4.200e+01
>> Flops:                6.961e+10      1.00074   6.959e+10  2.784e+11
>> Flops/sec:            3.980e+07      1.00117   3.978e+07  1.591e+08
>> MPI Messages:         8.168e+03      2.00000   6.126e+03  2.450e+04
>> MPI Message Lengths:  5.525e+07      2.00000   6.764e+03  1.658e+08
>> MPI Reductions:       3.203e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type 
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of length 
>> N --> 2N flops
>>                            and VecAXPY() for complex vectors of 
>> length N --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- 
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts   
>> %Total     Avg         %Total   counts   %Total
>> 0:      Main Stage: 1.7495e+03 100.0%  2.7837e+11 100.0%  2.450e+04 
>> 100.0%  6.764e+03      100.0%  1.281e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>> See the 'Profiling' chapter of the users' manual for details on 
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops/sec: Max - maximum over all processors
>>                       Ratio - ratio of maximum to minimum over all 
>> processors
>>   Mess: number of messages sent
>>   Avg. len: average message length
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush() 
>> and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in 
>> this phase
>>      %M - percent messages in this phase     %L - percent message 
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
>> over all processors)
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>>
>>      ##########################################################
>>      #                                                        #
>>      #                          WARNING!!!                    #
>>      #                                                        #
>>      #   This code was run without the PreLoadBegin()         #
>>      #   macros. To get timing results we always recommend    #
>>      #   preloading. otherwise timing numbers may be          #
>>      #   meaningless.                                         #
>>      ##########################################################
>>
>>
>> Event                Count      Time (sec)     
>> Flops/sec                         --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg 
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             4082 1.0 8.2037e+01 1.5 4.67e+08 1.5 2.4e+04 
>> 6.8e+03 0.0e+00  4 37100100  0   4 37100100  0  1240
>> MatSolve            1976 1.0 1.3250e+02 1.5 2.52e+08 1.5 0.0e+00 
>> 0.0e+00 0.0e+00  6 31  0  0  0   6 31  0  0  0   655
>> MatLUFactorNum       300 1.0 3.8260e+01 1.2 2.07e+08 1.2 0.0e+00 
>> 0.0e+00 0.0e+00  2  9  0  0  0   2  9  0  0  0   668
>> MatILUFactorSym        1 1.0 2.2550e-01 2.7 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatConvert             1 1.0 2.9182e-01 1.0 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyBegin     301 1.0 1.0776e+021228.9 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 6.0e+02  4  0  0  0  5   4  0  0  0  5     0
>> MatAssemblyEnd       301 1.0 9.6146e+00 1.1 0.00e+00 0.0 1.2e+01 
>> 3.6e+03 3.1e+02  1  0  0  0  2   1  0  0  0  2     0
>> MatGetRow         324000 1.0 1.2161e-01 1.4 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRowIJ            3 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetOrdering         1 1.0 2.1279e-02 2.3 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSetup             601 1.0 2.5108e-02 1.2 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve             600 1.0 1.2353e+03 1.0 5.64e+07 1.0 2.4e+04 
>> 6.8e+03 8.3e+03 71100100100 65  71100100100 65   225
>> PCSetUp              601 1.0 4.0116e+01 1.2 1.96e+08 1.2 0.0e+00 
>> 0.0e+00 5.0e+00  2  9  0  0  0   2  9  0  0  0   637
>> PCSetUpOnBlocks      300 1.0 3.8513e+01 1.2 2.06e+08 1.2 0.0e+00 
>> 0.0e+00 3.0e+00  2  9  0  0  0   2  9  0  0  0   664
>> PCApply             4682 1.0 1.0566e+03 1.0 2.12e+07 1.0 0.0e+00 
>> 0.0e+00 0.0e+00 59 31  0  0  0  59 31  0  0  0    82
>> VecDot              4812 1.0 8.2762e+00 1.1 4.00e+08 1.1 0.0e+00 
>> 0.0e+00 4.8e+03  0  4  0  0 38   0  4  0  0 38  1507
>> VecNorm             3479 1.0 9.2739e+01 8.3 3.15e+08 8.3 0.0e+00 
>> 0.0e+00 3.5e+03  4  5  0  0 27   4  5  0  0 27   152
>> VecCopy              900 1.0 2.0819e+00 1.4 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              5882 1.0 9.4626e+00 1.5 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY             5585 1.0 1.5397e+01 1.5 4.67e+08 1.5 0.0e+00 
>> 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0  1273
>> VecAYPX             2879 1.0 1.0303e+01 1.6 4.45e+08 1.6 0.0e+00 
>> 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  1146
>> VecWAXPY            2406 1.0 7.7902e+00 1.6 3.14e+08 1.6 0.0e+00 
>> 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   801
>> VecAssemblyBegin    1200 1.0 8.4259e+00 3.8 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 3.6e+03  0  0  0  0 28   0  0  0  0 28     0
>> VecAssemblyEnd      1200 1.0 2.4173e-03 1.3 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin     4082 1.0 1.2512e-01 1.5 0.00e+00 0.0 2.4e+04 
>> 6.8e+03 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd       4082 1.0 2.0954e+0153.3 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions   Memory  Descendants' 
>> Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>>              Matrix     7              7  321241092     0
>>       Krylov Solver     3              3          8     0
>>      Preconditioner     3              3        528     0
>>           Index Set     7              7    7785600     0
>>                 Vec    20             20   46685344     0
>>         Vec Scatter     2              2          0     0
>> ======================================================================================================================== 
>>
>> Average time to get PetscTime(): 1.90735e-07
>> Average time for MPI_Barrier(): 1.45912e-05
>> Average time for zero size MPI_Send(): 7.27177e-06
>> OptionTable: -log_summary test4_600
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Jan  8 22:22:08 2008
>> Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8 
>> --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8 
>> --sizeof_float=4 --sizeof_double=8 --bits_per_byte=8 
>> --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-vendor-compilers=intel 
>> --with-x=0 --with-hypre-dir=/home/enduser/g0306332/lib/hypre 
>> --with-debugging=0 --with-batch=1 --with-mpi-shared=0 
>> --with-mpi-include=/usr/local/topspin/mpi/mpich/include 
>> --with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a 
>> --with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun 
>> --with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
>> -----------------------------------------
>> Libraries compiled on Tue Jan  8 22:34:13 SGT 2008 on atlas3-c01
>> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed 
>> Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
>> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
>> Using PETSc arch: atlas3-mpi
>> -----------------------------------------
>> Using C compiler: mpicc -fPIC -O
>> Using Fortran compiler: mpif90 -I. -fPIC -O
>> -----------------------------------------
>> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 
>> -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi 
>> -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8/include 
>> -I/home/enduser/g0306332/lib/hypre/include 
>> -I/usr/local/topspin/mpi/mpich/include
>> ------------------------------------------
>> Using C linker: mpicc -fPIC -O
>> Using Fortran linker: mpif90 -I. -fPIC -O
>> Using libraries: 
>> -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/atlas3-mpi 
>> -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/atlas3-mpi -lpetscts 
>> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc        
>> -Wl,-rpath,/home/enduser/g0306332/lib/hypre/lib 
>> -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/local/topspin/mpi/mpich/lib 
>> -L/usr/local/topspin/mpi/mpich/lib -lmpich 
>> -Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t 
>> -L/opt/intel/cmkl/8.1.1/lib/em64t -lmkl_lapack -lmkl_em64t -lguide 
>> -lpthread -Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib 
>> -ldl -lmpich -libverbs -libumad -lpthread -lrt 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 
>> -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -lmpichf90nc 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/intel/fce/9.1.045/lib 
>> -L/opt/intel/fce/9.1.045/lib -lifport -lifcore -lm 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich 
>> -Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs 
>> -libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -L/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 
>> -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -ldl -lc
>> ------------------------------------------
>> ************************************************************************************************************************ 
>>
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript 
>> -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************ 
>>
>>
>> ---------------------------------------------- PETSc Performance 
>> Summary: ----------------------------------------------
>>
>> ./a.out on a atlas3-mp named atlas3-c18 with 8 processors, by 
>> g0306332 Fri Jun  6 17:23:25 2008
>> Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 
>> 2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
>>
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           1.140e+03      1.00019   1.140e+03
>> Objects:              4.200e+01      1.00000   4.200e+01
>> Flops:                4.620e+10      1.00158   4.619e+10  3.695e+11
>> Flops/sec:            4.053e+07      1.00177   4.051e+07  3.241e+08
>> MPI Messages:         9.954e+03      2.00000   8.710e+03  6.968e+04
>> MPI Message Lengths:  7.224e+07      2.00000   7.257e+03  5.057e+08
>> MPI Reductions:       1.716e+03      1.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type 
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of length 
>> N --> 2N flops
>>                            and VecAXPY() for complex vectors of 
>> length N --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- 
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts   
>> %Total     Avg         %Total   counts   %Total
>> 0:      Main Stage: 1.1402e+03 100.0%  3.6953e+11 100.0%  6.968e+04 
>> 100.0%  7.257e+03      100.0%  1.372e+04 100.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>> See the 'Profiling' chapter of the users' manual for details on 
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops/sec: Max - maximum over all processors
>>                       Ratio - ratio of maximum to minimum over all 
>> processors
>>   Mess: number of messages sent
>>   Avg. len: average message length
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush() 
>> and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in 
>> this phase
>>      %M - percent messages in this phase     %L - percent message 
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
>> over all processors)
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>>
>>      ##########################################################
>>      #                                                        #
>>      #                          WARNING!!!                    #
>>      #                                                        #
>>      #   This code was run without the PreLoadBegin()         #
>>      #   macros. To get timing results we always recommend    #
>>      #   preloading. otherwise timing numbers may be          #
>>      #   meaningless.                                         #
>>      ##########################################################
>>
>>
>> Event                Count      Time (sec)     
>> Flops/sec                         --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg 
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>> --- Event Stage 0: Main Stage
>>
>> MatMult             4975 1.0 7.8154e+01 1.9 4.19e+08 1.9 7.0e+04 
>> 7.3e+03 0.0e+00  5 38100100  0   5 38100100  0  1798
>> MatSolve            2855 1.0 1.0870e+02 1.8 2.57e+08 1.8 0.0e+00 
>> 0.0e+00 0.0e+00  7 34  0  0  0   7 34  0  0  0  1153
>> MatLUFactorNum       300 1.0 2.3238e+01 1.5 2.07e+08 1.5 0.0e+00 
>> 0.0e+00 0.0e+00  2  7  0  0  0   2  7  0  0  0  1099
>> MatILUFactorSym        1 1.0 6.1973e-02 1.5 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatConvert             1 1.0 1.4168e-01 1.0 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyBegin     301 1.0 6.9683e+01 8.6 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 6.0e+02  4  0  0  0  4   4  0  0  0  4     0
>> MatAssemblyEnd       301 1.0 6.2247e+00 1.2 0.00e+00 0.0 2.8e+01 
>> 3.6e+03 3.1e+02  0  0  0  0  2   0  0  0  0  2     0
>> MatGetRow         162000 1.0 6.0330e-02 1.4 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRowIJ            3 1.0 9.0599e-06 3.2 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetOrdering         1 1.0 5.6710e-03 1.4 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSetup             601 1.0 1.5631e-02 1.1 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve             600 1.0 8.1668e+02 1.0 5.66e+07 1.0 7.0e+04 
>> 7.3e+03 9.2e+03 72100100100 67  72100100100 67   452
>> PCSetUp              601 1.0 2.4372e+01 1.5 1.93e+08 1.5 0.0e+00 
>> 0.0e+00 5.0e+00  2  7  0  0  0   2  7  0  0  0  1048
>> PCSetUpOnBlocks      300 1.0 2.3303e+01 1.5 2.07e+08 1.5 0.0e+00 
>> 0.0e+00 3.0e+00  2  7  0  0  0   2  7  0  0  0  1096
>> PCApply             5575 1.0 6.5344e+02 1.1 2.57e+07 1.1 0.0e+00 
>> 0.0e+00 0.0e+00 55 34  0  0  0  55 34  0  0  0   192
>> VecDot              4840 1.0 6.8932e+00 1.3 3.07e+08 1.3 0.0e+00 
>> 0.0e+00 4.8e+03  1  3  0  0 35   1  3  0  0 35  1820
>> VecNorm             4365 1.0 1.2250e+02 3.6 6.82e+07 3.6 0.0e+00 
>> 0.0e+00 4.4e+03  8  5  0  0 32   8  5  0  0 32   153
>> VecCopy              900 1.0 1.4297e+00 1.8 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet              6775 1.0 8.1405e+00 1.8 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY             6485 1.0 1.0003e+01 1.9 5.73e+08 1.9 0.0e+00 
>> 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0  2420
>> VecAYPX             3765 1.0 7.8289e+00 2.0 5.17e+08 2.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  2092
>> VecWAXPY            2420 1.0 3.8504e+00 1.9 3.80e+08 1.9 0.0e+00 
>> 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  1629
>> VecAssemblyBegin    1200 1.0 9.2808e+00 3.4 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 3.6e+03  1  0  0  0 26   1  0  0  0 26     0
>> VecAssemblyEnd      1200 1.0 2.3313e-03 1.3 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin     4975 1.0 2.2727e-01 2.6 0.00e+00 0.0 7.0e+04 
>> 7.3e+03 0.0e+00  0  0100100  0   0  0100100  0     0
>> VecScatterEnd       4975 1.0 2.7557e+0168.1 0.00e+00 0.0 0.0e+00 
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> ------------------------------------------------------------------------------------------------------------------------ 
>>
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions   Memory  Descendants' 
>> Mem.
>>
>> --- Event Stage 0: Main Stage
>>
>>              Matrix     7              7  160595412     0
>>       Krylov Solver     3              3          8     0
>>      Preconditioner     3              3        528     0
>>           Index Set     7              7    3897600     0
>>                 Vec    20             20   23357344     0
>>         Vec Scatter     2              2          0     0
>> ======================================================================================================================== 
>>
>> Average time to get PetscTime(): 1.19209e-07
>> Average time for MPI_Barrier(): 2.10285e-05
>> Average time for zero size MPI_Send(): 7.59959e-06
>> OptionTable: -log_summary test8_600
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
>> sizeof(PetscScalar) 8
>> Configure run at: Tue Jan  8 22:22:08 2008
>> Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8 
>> --sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8 
>> --sizeof_float=4 --sizeof_double=8 --bits_per_byte=8 
>> --sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-vendor-compilers=intel 
>> --with-x=0 --with-hypre-dir=/home/enduser/g0306332/lib/hypre 
>> --with-debugging=0 --with-batch=1 --with-mpi-shared=0 
>> --with-mpi-include=/usr/local/topspin/mpi/mpich/include 
>> --with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a 
>> --with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun 
>> --with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
>> -----------------------------------------
>> Libraries compiled on Tue Jan  8 22:34:13 SGT 2008 on atlas3-c01
>> Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed 
>> Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
>> Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8
>> Using PETSc arch: atlas3-mpi
>> -----------------------------------------
>> Using C compiler: mpicc -fPIC -O
>> Using Fortran compiler: mpif90 -I. -fPIC -O
>> -----------------------------------------
>> Using include paths: -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8 
>> -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8/bmake/atlas3-mpi 
>> -I/nfs/home/enduser/g0306332/petsc-2.3.3-p8/include 
>> -I/home/enduser/g0306332/lib/hypre/include 
>> -I/usr/local/topspin/mpi/mpich/include
>> ------------------------------------------
>> Using C linker: mpicc -fPIC -O
>> Using Fortran linker: mpif90 -I. -fPIC -O
>> Using libraries: 
>> -Wl,-rpath,/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/atlas3-mpi 
>> -L/nfs/home/enduser/g0306332/petsc-2.3.3-p8/lib/atlas3-mpi -lpetscts 
>> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc        
>> -Wl,-rpath,/home/enduser/g0306332/lib/hypre/lib 
>> -L/home/enduser/g0306332/lib/hypre/lib -lHYPRE 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/usr/local/topspin/mpi/mpich/lib 
>> -L/usr/local/topspin/mpi/mpich/lib -lmpich 
>> -Wl,-rpath,/opt/intel/cmkl/8.1.1/lib/em64t 
>> -L/opt/intel/cmkl/8.1.1/lib/em64t -lmkl_lapack -lmkl_em64t -lguide 
>> -lpthread -Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib -L/opt/mvapich/0.9.9/gen2/lib 
>> -ldl -lmpich -libverbs -libumad -lpthread -lrt 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib -L/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 
>> -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -lmpichf90nc 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/intel/fce/9.1.045/lib 
>> -L/opt/intel/fce/9.1.045/lib -lifport -lifcore -lm 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lm -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -lstdc++ -lcxaguard 
>> -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -Wl,-rpath,/usr/local/ofed/lib64 
>> -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -Wl,-rpath,/usr/lib64 -Wl,-rpath,/opt/mvapich/0.9.9/gen2/lib 
>> -L/opt/mvapich/0.9.9/gen2/lib -ldl -lmpich 
>> -Wl,-rpath,/usr/local/ofed/lib64 -L/usr/local/ofed/lib64 -libverbs 
>> -libumad -lpthread -lrt -Wl,-rpath,/opt/intel/cce/9.1.049/lib 
>> -L/opt/intel/cce/9.1.049/lib 
>> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ 
>> -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/ -Wl,-rpath,/usr/lib64 
>> -L/usr/lib64 -lsvml -limf -lipgo -lirc -lgcc_s -lirc_s -ldl -lc
>> ------------------------------------------
>
>




More information about the petsc-users mailing list