[petsc-users] Speedup problem when using OpenMP?
Jed Brown
jedbrown at mcs.anl.gov
Mon Nov 4 18:32:33 CST 2013
Danyang Su <danyang.su at gmail.com> writes:
> Hi All,
>
> I have test the same example under Ubuntu12.04 X64. The PETSc-dev
> version is update to date (GIT Date: 2013-11-01 14:59:20 -0500) and the
> installation is smooth without any error. The speedup of MPI version is
> linear scalable but the speedup of OpenMP version does not change. *From
> the CPU usage, the program still run in one thread when use OpenMP. *
>
> The commands to run the test are as follows:
>
> openmp
> ./ex2f -threadcomm_type openmp -threadcomm_nthreads 4 -m 1000 -n 1000
> -log_summary log_ex2f_1000x1000_ubuntu1204_omp_p4.log
>
> mpi
> mpiexec -n 4 ./ex2f -m 1000 -n 1000 -log_summary
> log_ex2f_1000x1000_ubuntu1204_mpi_p4.log
>
> This problem is so tricky to me. Can anybody confirm if KSP solver is
> parallelized for OpenMP version?
>
> Thanks and regards,
>
> Danyang
>
> On 31/10/2013 4:54 PM, Danyang Su wrote:
>> Hi All,
>>
>> I have a question on the speedup of PETSc when using OpenMP. I can get
>> good speedup when using MPI, but no speedup when using OpenMP.
>> The example is ex2f with m=100 and n=100. The number of available
>> processors is 16 (32 threads) and the OS is Windows Server 2012. The
>> log files for 4 and 8 processors are attached.
>>
>> The commands I used to run with 4 processors are as follows:
>> Run using MPI
>> mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary
>> log_100x100_mpi_p4.log
>>
>> Run using OpenMP
>> Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4
>> -m 100 -n 100 -log_summary log_100x100_openmp_p4.log
>>
>> The PETSc used for this test is PETSc for Windows
>> http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this
>> is not the problem because the same problem exists when I use
>> PETSc-dev in Cygwin. I don't know if this problem exists in Linux,
>> would anybody help to test?
>>
>> Thanks and regards,
>>
>> Danyang
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov 4 15:35:47 2013
> With 4 threads per MPI_Comm
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
>
> Max Max/Min Avg Total
> Time (sec): 2.376e+02 1.00000 2.376e+02
> Objects: 4.500e+01 1.00000 4.500e+01
> Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11
> Flops/sec: 9.271e+08 1.00000 9.271e+08 9.271e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 2.3759e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 2657 1.0 4.1715e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 573
> MatSolve 2657 1.0 6.4028e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 373
> MatLUFactorNum 1 1.0 1.1149e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 99
> MatILUFactorSym 1 1.0 8.2365e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 1 1.0 7.8678e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 9.1023e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 1.0014e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.2122e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMDot 2571 1.0 5.1144e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36 0 0 0 22 36 0 0 0 1555
> VecNorm 2658 1.0 5.4516e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 975
> VecScale 2657 1.0 3.8631e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 688
> VecCopy 86 1.0 2.2233e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 88 1.0 1.1501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 172 1.0 4.4589e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 771
> VecMAXPY 2657 1.0 6.9213e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 1223
> VecNormalize 2657 1.0 9.3968e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 848
> KSPGMRESOrthog 2571 1.0 1.1630e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 49 72 0 0 0 49 72 0 0 0 1367
> KSPSetUp 1 1.0 2.8520e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 2.3699e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 929
> PCSetUp 1 1.0 2.0609e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 53
> PCApply 2657 1.0 6.4088e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 373
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 2 2 151957404 0
> Vector 37 37 296057424 0
> Krylov Solver 1 1 18368 0
> Preconditioner 1 1 984 0
> Index Set 3 3 4002304 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 6.50883e-06
> #PETSc Option Table entries:
> -log_summary log_1000x1000_omp_p4.log
> -m 1000
> -n 1000
> -threadcomm_nthreads 4
> -threadcomm_type openmp
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov 4 15:22:12 2013
> Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
Add --with-threadcomm --with-pthreadclasses to the configuration. Using
--with-openmp on its own doesn't turn on threadcomm. Use all three
flags for now and compare -threadcomm_type openmp to -threadcomm_type
pthread.
http://www.mcs.anl.gov/petsc/documentation/installation.html#threads
> -----------------------------------------
> Libraries compiled on Mon Nov 4 15:22:12 2013 on dsu-pc
> Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-omp-opt
> -----------------------------------------
>
> Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: gfortran -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O -fopenmp ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
> -----------------------------------------
>
> Using C linker: gcc
> Using Fortran linker: gfortran
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl
> -----------------------------------------
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov 4 15:31:30 2013
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
>
> Max Max/Min Avg Total
> Time (sec): 2.388e+02 1.00000 2.388e+02
> Objects: 4.500e+01 1.00000 4.500e+01
> Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11
> Flops/sec: 9.224e+08 1.00000 9.224e+08 9.224e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 2.3881e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 2657 1.0 4.0429e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 17 11 0 0 0 17 11 0 0 0 591
> MatSolve 2657 1.0 6.3888e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 374
> MatLUFactorNum 1 1.0 1.2874e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 85
> MatILUFactorSym 1 1.0 1.3501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 6.8491e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.3066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMDot 2571 1.0 5.2507e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36 0 0 0 22 36 0 0 0 1514
> VecNorm 2658 1.0 5.4426e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 977
> VecScale 2657 1.0 3.8871e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 684
> VecCopy 86 1.0 1.9921e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 88 1.0 1.0965e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 172 1.0 4.0171e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 856
> VecMAXPY 2657 1.0 7.0096e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 1208
> VecNormalize 2657 1.0 9.4060e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 847
> KSPGMRESOrthog 2571 1.0 1.1847e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 50 72 0 0 0 50 72 0 0 0 1342
> KSPSetUp 1 1.0 3.7805e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 2.3820e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 925
> PCSetUp 1 1.0 2.7698e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40
> PCApply 2657 1.0 6.3946e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 374
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 2 2 151957404 0
> Vector 37 37 296057424 0
> Krylov Solver 1 1 18368 0
> Preconditioner 1 1 984 0
> Index Set 3 3 4002304 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 8.51154e-06
> #PETSc Option Table entries:
> -log_summary log_1000x1000_omp_p1.log
> -m 1000
> -n 1000
> -threadcomm_nthreads 1
> -threadcomm_type openmp
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov 4 15:22:12 2013
> Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
> -----------------------------------------
> Libraries compiled on Mon Nov 4 15:22:12 2013 on dsu-pc
> Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-omp-opt
> -----------------------------------------
>
> Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: gfortran -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O -fopenmp ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
> -----------------------------------------
>
> Using C linker: gcc
> Using Fortran linker: gfortran
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl
> -----------------------------------------
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-opt named dsu-pc with 4 processors, by root Mon Nov 4 16:10:24 2013
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
>
> Max Max/Min Avg Total
> Time (sec): 5.364e+01 1.00045 5.362e+01
> Objects: 5.600e+01 1.00000 5.600e+01
> Flops: 2.837e+10 1.00010 2.837e+10 1.135e+11
> Flops/sec: 5.291e+08 1.00054 5.290e+08 2.116e+09
> MPI Messages: 2.744e+03 2.00000 2.058e+03 8.232e+03
> MPI Message Lengths: 2.193e+07 2.00000 7.991e+03 6.578e+07
> MPI Reductions: 2.720e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 5.3623e+01 100.0% 1.1347e+11 100.0% 8.232e+03 100.0% 7.991e+03 100.0% 2.719e+03 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 1370 1.0 8.7882e+00 1.0 3.08e+09 1.0 8.2e+03 8.0e+03 0.0e+00 16 11100100 0 16 11100100 0 1402
> MatSolve 1370 1.0 9.0304e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 17 11 0 0 0 17 11 0 0 0 1362
> MatLUFactorNum 1 1.0 3.3336e-02 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 329
> MatILUFactorSym 1 1.0 7.1875e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 1 1.0 7.2212e-0241.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 5.4802e-02 1.0 0.00e+00 0.0 1.2e+01 2.0e+03 9.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 1.2875e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 4.8881e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMDot 1325 1.0 1.4754e+01 1.0 1.02e+10 1.0 0.0e+00 0.0e+00 1.3e+03 27 36 0 0 49 27 36 0 0 49 2776
> VecNorm 1371 1.0 1.9989e+00 1.1 6.86e+08 1.0 0.0e+00 0.0e+00 1.4e+03 4 2 0 0 50 4 2 0 0 50 1372
> VecScale 1370 1.0 4.9844e-01 1.1 3.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2749
> VecCopy 45 1.0 4.4863e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1418 1.0 6.2273e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAXPY 90 1.0 1.0165e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1771
> VecMAXPY 1370 1.0 1.5635e+01 1.0 1.09e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 2789
> VecScatterBegin 1370 1.0 1.6159e-01 1.8 0.00e+00 0.0 8.2e+03 8.0e+03 0.0e+00 0 0100100 0 0 0100100 0 0
> VecScatterEnd 1370 1.0 9.6929e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecNormalize 1370 1.0 2.5033e+00 1.1 1.03e+09 1.0 0.0e+00 0.0e+00 1.4e+03 5 4 0 0 50 5 4 0 0 50 1642
> KSPGMRESOrthog 1325 1.0 2.9419e+01 1.0 2.05e+10 1.0 0.0e+00 0.0e+00 1.3e+03 54 72 0 0 49 54 72 0 0 49 2784
> KSPSetUp 2 1.0 2.1291e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 5.2989e+01 1.0 2.84e+10 1.0 8.2e+03 8.0e+03 2.7e+03 99100100100 99 99100100100 99 2141
> PCSetUp 2 1.0 1.4600e-01 1.1 2.74e+06 1.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 75
> PCSetUpOnBlocks 1 1.0 1.1017e-01 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 100
> PCApply 1370 1.0 9.7092e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 1267
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 4 4 56984588 0
> Vector 41 41 74071440 0
> Vector Scatter 1 1 1060 0
> Index Set 5 5 1007832 0
> Krylov Solver 2 2 19520 0
> Preconditioner 2 2 1864 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 6.19888e-06
> Average time for MPI_Barrier(): 0.000529623
> Average time for zero size MPI_Send(): 0.000117242
> #PETSc Option Table entries:
> -log_summary log_1000x1000_mpi_p4.log
> -m 1000
> -n 1000
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov 4 14:29:26 2013
> Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
> -----------------------------------------
> Libraries compiled on Mon Nov 4 14:29:26 2013 on dsu-pc
> Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-opt
> -----------------------------------------
>
> Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
> -----------------------------------------
>
> Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
> Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
> -----------------------------------------
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex2f on a linux-gnu-opt named dsu-pc with 1 processor, by root Mon Nov 4 16:14:37 2013
> Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
>
> Max Max/Min Avg Total
> Time (sec): 2.295e+02 1.00000 2.295e+02
> Objects: 4.500e+01 1.00000 4.500e+01
> Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11
> Flops/sec: 9.597e+08 1.00000 9.597e+08 9.597e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 5.236e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 2.2953e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 5.235e+03 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 2657 1.0 4.0388e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 592
> MatSolve 2657 1.0 6.1962e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 386
> MatLUFactorNum 1 1.0 1.2718e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 86
> MatILUFactorSym 1 1.0 9.5901e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 1 1.0 1.2159e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 6.3241e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.2885e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMDot 2571 1.0 4.9771e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 2.6e+03 22 36 0 0 49 22 36 0 0 49 1598
> VecNorm 2658 1.0 5.2489e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 2.7e+03 2 2 0 0 51 2 2 0 0 51 1013
> VecScale 2657 1.0 3.5420e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 750
> VecCopy 86 1.0 2.0908e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 88 1.0 1.1408e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 172 1.0 4.3620e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 789
> VecMAXPY 2657 1.0 6.6513e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 1273
> VecNormalize 2657 1.0 8.8659e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 2.7e+03 4 4 0 0 51 4 4 0 0 51 899
> KSPGMRESOrthog 2571 1.0 1.1234e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 2.6e+03 49 72 0 0 49 49 72 0 0 49 1416
> KSPSetUp 1 1.0 2.9065e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1 1.0 2.2896e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 5.2e+03100100 0 0100 100100 0 0100 962
> PCSetUp 1 1.0 2.3610e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 47
> PCApply 2657 1.0 6.2019e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 385
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 2 2 151957404 0
> Vector 37 37 296057424 0
> Krylov Solver 1 1 18368 0
> Preconditioner 1 1 984 0
> Index Set 3 3 4002304 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 5.81741e-06
> #PETSc Option Table entries:
> -log_summary log_1000x1000_mpi_p1.log
> -m 1000
> -n 1000
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Mon Nov 4 14:29:26 2013
> Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
> -----------------------------------------
> Libraries compiled on Mon Nov 4 14:29:26 2013 on dsu-pc
> Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
> Using PETSc directory: /home/dsu/petsc
> Using PETSc arch: linux-gnu-opt
> -----------------------------------------
>
> Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
> -----------------------------------------
>
> Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
> Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
> Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
> -----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131104/a99147d5/attachment-0001.pgp>
More information about the petsc-users
mailing list