[petsc-users] Speedup problem when using OpenMP?
    Jed Brown 
    jedbrown at mcs.anl.gov
       
    Mon Nov  4 18:32:33 CST 2013
    
    
  
Danyang Su <danyang.su at gmail.com> writes:
> Hi All,
>
> I have test the same example under Ubuntu12.04 X64.  The PETSc-dev 
> version is update to date (GIT Date: 2013-11-01 14:59:20 -0500) and the 
> installation is smooth without any error. The speedup of MPI version is 
> linear scalable but the speedup of OpenMP version does not change. *From 
> the CPU usage, the program still run in one thread when use OpenMP. *
>
> The commands to run the test are as follows:
>
> openmp
> ./ex2f -threadcomm_type openmp -threadcomm_nthreads 4 -m 1000 -n 1000 
> -log_summary log_ex2f_1000x1000_ubuntu1204_omp_p4.log
>
> mpi
> mpiexec -n 4 ./ex2f -m 1000 -n 1000 -log_summary 
> log_ex2f_1000x1000_ubuntu1204_mpi_p4.log
>
> This problem is so tricky to me. Can anybody confirm if KSP solver is 
> parallelized for OpenMP version?
>
> Thanks and regards,
>
> Danyang
>
> On 31/10/2013 4:54 PM, Danyang Su wrote:
>> Hi All,
>>
>> I have a question on the speedup of PETSc when using OpenMP. I can get 
>> good speedup when using MPI, but no speedup when using OpenMP.
>> The example is ex2f with m=100 and n=100. The number of available 
>> processors is 16 (32 threads) and the OS is Windows Server 2012. The 
>> log files for 4 and 8 processors are attached.
>>
>> The commands I used to run with 4 processors are as follows:
>> Run using MPI
>> mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary 
>> log_100x100_mpi_p4.log
>>
>> Run using OpenMP
>> Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 
>> -m 100 -n 100 -log_summary log_100x100_openmp_p4.log
>>
>> The PETSc used for this test is PETSc for Windows 
>> http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this 
>> is not the problem because the same problem exists when I use 
>> PETSc-dev in Cygwin. I don't know if this problem exists in Linux, 
>> would anybody help to test?
>>
>> Thanks and regards,
>>
>> Danyang
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov  4 15:35:47 2013
> With 4 threads per MPI_Comm
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500
>
>                          Max       Max/Min        Avg      Total 
> Time (sec):           2.376e+02      1.00000   2.376e+02
> Objects:              4.500e+01      1.00000   4.500e+01
> Flops:                2.203e+11      1.00000   2.203e+11  2.203e+11
> Flops/sec:            9.271e+08      1.00000   9.271e+08  9.271e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>  0:      Main Stage: 2.3759e+02 100.0%  2.2028e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             2657 1.0 4.1715e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0   573
> MatSolve            2657 1.0 6.4028e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   373
> MatLUFactorNum         1 1.0 1.1149e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    99
> MatILUFactorSym        1 1.0 8.2365e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       1 1.0 7.8678e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 9.1023e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 1.0014e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.2122e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             2571 1.0 5.1144e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36  0  0  0  22 36  0  0  0  1555
> VecNorm             2658 1.0 5.4516e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   975
> VecScale            2657 1.0 3.8631e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   688
> VecCopy               86 1.0 2.2233e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                88 1.0 1.1501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              172 1.0 4.4589e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   771
> VecMAXPY            2657 1.0 6.9213e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  1223
> VecNormalize        2657 1.0 9.3968e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   848
> KSPGMRESOrthog      2571 1.0 1.1630e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 49 72  0  0  0  49 72  0  0  0  1367
> KSPSetUp               1 1.0 2.8520e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.3699e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100  0  0  0 100100  0  0  0   929
> PCSetUp                1 1.0 2.0609e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    53
> PCApply             2657 1.0 6.4088e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   373
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     2              2    151957404     0
>               Vector    37             37    296057424     0
>        Krylov Solver     1              1        18368     0
>       Preconditioner     1              1          984     0
>            Index Set     3              3      4002304     0
>               Viewer     1              0            0     0
> ========================================================================================================================
> Average time to get PetscTime(): 6.50883e-06
> #PETSc Option Table entries:
> -log_summary log_1000x1000_omp_p4.log
> -m 1000
> -n 1000
> -threadcomm_nthreads 4
> -threadcomm_type openmp
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov  4 15:22:12 2013
> Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
Add --with-threadcomm --with-pthreadclasses to the configuration.  Using
--with-openmp on its own doesn't turn on threadcomm.  Use all three
flags for now and compare -threadcomm_type openmp to -threadcomm_type
pthread.
http://www.mcs.anl.gov/petsc/documentation/installation.html#threads
> -----------------------------------------
> Libraries compiled on Mon Nov  4 15:22:12 2013 on dsu-pc 
> Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-omp-opt
> -----------------------------------------
>
> Using C compiler: gcc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: gfortran  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  -fopenmp  ${FOPTFLAGS} ${FFLAGS} 
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
> -----------------------------------------
>
> Using C linker: gcc
> Using Fortran linker: gfortran
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl 
> -----------------------------------------
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov  4 15:31:30 2013
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500
>
>                          Max       Max/Min        Avg      Total 
> Time (sec):           2.388e+02      1.00000   2.388e+02
> Objects:              4.500e+01      1.00000   4.500e+01
> Flops:                2.203e+11      1.00000   2.203e+11  2.203e+11
> Flops/sec:            9.224e+08      1.00000   9.224e+08  9.224e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>  0:      Main Stage: 2.3881e+02 100.0%  2.2028e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             2657 1.0 4.0429e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 17 11  0  0  0  17 11  0  0  0   591
> MatSolve            2657 1.0 6.3888e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   374
> MatLUFactorNum         1 1.0 1.2874e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    85
> MatILUFactorSym        1 1.0 1.3501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 6.8491e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.3066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             2571 1.0 5.2507e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36  0  0  0  22 36  0  0  0  1514
> VecNorm             2658 1.0 5.4426e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   977
> VecScale            2657 1.0 3.8871e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   684
> VecCopy               86 1.0 1.9921e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                88 1.0 1.0965e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              172 1.0 4.0171e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   856
> VecMAXPY            2657 1.0 7.0096e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  1208
> VecNormalize        2657 1.0 9.4060e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   847
> KSPGMRESOrthog      2571 1.0 1.1847e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 50 72  0  0  0  50 72  0  0  0  1342
> KSPSetUp               1 1.0 3.7805e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.3820e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100  0  0  0 100100  0  0  0   925
> PCSetUp                1 1.0 2.7698e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    40
> PCApply             2657 1.0 6.3946e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   374
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     2              2    151957404     0
>               Vector    37             37    296057424     0
>        Krylov Solver     1              1        18368     0
>       Preconditioner     1              1          984     0
>            Index Set     3              3      4002304     0
>               Viewer     1              0            0     0
> ========================================================================================================================
> Average time to get PetscTime(): 8.51154e-06
> #PETSc Option Table entries:
> -log_summary log_1000x1000_omp_p1.log
> -m 1000
> -n 1000
> -threadcomm_nthreads 1
> -threadcomm_type openmp
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov  4 15:22:12 2013
> Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
> -----------------------------------------
> Libraries compiled on Mon Nov  4 15:22:12 2013 on dsu-pc 
> Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-omp-opt
> -----------------------------------------
>
> Using C compiler: gcc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: gfortran  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  -fopenmp  ${FOPTFLAGS} ${FFLAGS} 
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
> -----------------------------------------
>
> Using C linker: gcc
> Using Fortran linker: gfortran
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl 
> -----------------------------------------
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-opt named dsu-pc with 4 processors, by root Mon Nov  4 16:10:24 2013
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500
>
>                          Max       Max/Min        Avg      Total 
> Time (sec):           5.364e+01      1.00045   5.362e+01
> Objects:              5.600e+01      1.00000   5.600e+01
> Flops:                2.837e+10      1.00010   2.837e+10  1.135e+11
> Flops/sec:            5.291e+08      1.00054   5.290e+08  2.116e+09
> MPI Messages:         2.744e+03      2.00000   2.058e+03  8.232e+03
> MPI Message Lengths:  2.193e+07      2.00000   7.991e+03  6.578e+07
> MPI Reductions:       2.720e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>  0:      Main Stage: 5.3623e+01 100.0%  1.1347e+11 100.0%  8.232e+03 100.0%  7.991e+03      100.0%  2.719e+03 100.0% 
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             1370 1.0 8.7882e+00 1.0 3.08e+09 1.0 8.2e+03 8.0e+03 0.0e+00 16 11100100  0  16 11100100  0  1402
> MatSolve            1370 1.0 9.0304e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 17 11  0  0  0  17 11  0  0  0  1362
> MatLUFactorNum         1 1.0 3.3336e-02 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   329
> MatILUFactorSym        1 1.0 7.1875e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       1 1.0 7.2212e-0241.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 5.4802e-02 1.0 0.00e+00 0.0 1.2e+01 2.0e+03 9.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 1.2875e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 4.8881e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             1325 1.0 1.4754e+01 1.0 1.02e+10 1.0 0.0e+00 0.0e+00 1.3e+03 27 36  0  0 49  27 36  0  0 49  2776
> VecNorm             1371 1.0 1.9989e+00 1.1 6.86e+08 1.0 0.0e+00 0.0e+00 1.4e+03  4  2  0  0 50   4  2  0  0 50  1372
> VecScale            1370 1.0 4.9844e-01 1.1 3.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2749
> VecCopy               45 1.0 4.4863e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1418 1.0 6.2273e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY               90 1.0 1.0165e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1771
> VecMAXPY            1370 1.0 1.5635e+01 1.0 1.09e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  2789
> VecScatterBegin     1370 1.0 1.6159e-01 1.8 0.00e+00 0.0 8.2e+03 8.0e+03 0.0e+00  0  0100100  0   0  0100100  0     0
> VecScatterEnd       1370 1.0 9.6929e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> VecNormalize        1370 1.0 2.5033e+00 1.1 1.03e+09 1.0 0.0e+00 0.0e+00 1.4e+03  5  4  0  0 50   5  4  0  0 50  1642
> KSPGMRESOrthog      1325 1.0 2.9419e+01 1.0 2.05e+10 1.0 0.0e+00 0.0e+00 1.3e+03 54 72  0  0 49  54 72  0  0 49  2784
> KSPSetUp               2 1.0 2.1291e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 5.2989e+01 1.0 2.84e+10 1.0 8.2e+03 8.0e+03 2.7e+03 99100100100 99  99100100100 99  2141
> PCSetUp                2 1.0 1.4600e-01 1.1 2.74e+06 1.0 0.0e+00 0.0e+00 5.0e+00  0  0  0  0  0   0  0  0  0  0    75
> PCSetUpOnBlocks        1 1.0 1.1017e-01 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0   100
> PCApply             1370 1.0 9.7092e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0  1267
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     4              4     56984588     0
>               Vector    41             41     74071440     0
>       Vector Scatter     1              1         1060     0
>            Index Set     5              5      1007832     0
>        Krylov Solver     2              2        19520     0
>       Preconditioner     2              2         1864     0
>               Viewer     1              0            0     0
> ========================================================================================================================
> Average time to get PetscTime(): 6.19888e-06
> Average time for MPI_Barrier(): 0.000529623
> Average time for zero size MPI_Send(): 0.000117242
> #PETSc Option Table entries:
> -log_summary log_1000x1000_mpi_p4.log
> -m 1000
> -n 1000
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov  4 14:29:26 2013
> Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
> -----------------------------------------
> Libraries compiled on Mon Nov  4 14:29:26 2013 on dsu-pc 
> Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-opt
> -----------------------------------------
>
> Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90  -fPIC  -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  ${FOPTFLAGS} ${FFLAGS} 
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
> -----------------------------------------
>
> Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
> Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
> -----------------------------------------
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-opt named dsu-pc with 1 processor, by root Mon Nov  4 16:14:37 2013
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500
>
>                          Max       Max/Min        Avg      Total 
> Time (sec):           2.295e+02      1.00000   2.295e+02
> Objects:              4.500e+01      1.00000   4.500e+01
> Flops:                2.203e+11      1.00000   2.203e+11  2.203e+11
> Flops/sec:            9.597e+08      1.00000   9.597e+08  9.597e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       5.236e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>  0:      Main Stage: 2.2953e+02 100.0%  2.2028e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  5.235e+03 100.0% 
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             2657 1.0 4.0388e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0   592
> MatSolve            2657 1.0 6.1962e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   386
> MatLUFactorNum         1 1.0 1.2718e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    86
> MatILUFactorSym        1 1.0 9.5901e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       1 1.0 1.2159e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 6.3241e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.2885e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             2571 1.0 4.9771e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 2.6e+03 22 36  0  0 49  22 36  0  0 49  1598
> VecNorm             2658 1.0 5.2489e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 2.7e+03  2  2  0  0 51   2  2  0  0 51  1013
> VecScale            2657 1.0 3.5420e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   750
> VecCopy               86 1.0 2.0908e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                88 1.0 1.1408e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              172 1.0 4.3620e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   789
> VecMAXPY            2657 1.0 6.6513e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  1273
> VecNormalize        2657 1.0 8.8659e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 2.7e+03  4  4  0  0 51   4  4  0  0 51   899
> KSPGMRESOrthog      2571 1.0 1.1234e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 2.6e+03 49 72  0  0 49  49 72  0  0 49  1416
> KSPSetUp               1 1.0 2.9065e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.2896e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 5.2e+03100100  0  0100 100100  0  0100   962
> PCSetUp                1 1.0 2.3610e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0    47
> PCApply             2657 1.0 6.2019e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   385
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     2              2    151957404     0
>               Vector    37             37    296057424     0
>        Krylov Solver     1              1        18368     0
>       Preconditioner     1              1          984     0
>            Index Set     3              3      4002304     0
>               Viewer     1              0            0     0
> ========================================================================================================================
> Average time to get PetscTime(): 5.81741e-06
> #PETSc Option Table entries:
> -log_summary log_1000x1000_mpi_p1.log
> -m 1000
> -n 1000
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov  4 14:29:26 2013
> Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
> -----------------------------------------
> Libraries compiled on Mon Nov  4 14:29:26 2013 on dsu-pc 
> Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-opt
> -----------------------------------------
>
> Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90  -fPIC  -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  ${FOPTFLAGS} ${FFLAGS} 
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
> -----------------------------------------
>
> Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
> Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
> -----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131104/a99147d5/attachment-0001.pgp>
    
    
More information about the petsc-users
mailing list