[petsc-dev] OpenMP in PETSc when calling from Fortran?
Barry Smith
bsmith at mcs.anl.gov
Wed Mar 6 15:38:22 CST 2013
I don't see any options for turning on the threads here?
#PETSc Option Table entries:
-ksp_type bcgs
-log_summary
-pc_type lu
#End of PETSc Option Table entries
From http://www.mcs.anl.gov/petsc/features/threads.html
• The three important run-time options for using threads are:
• -threadcomm_nthreads <nthreads>: Sets the number of threads
• -threadcomm_affinities <list_of_affinities>: Sets the core affinities of threads
• -threadcomm_type <nothread,pthread,openmp>: Threading model (OpenMP, pthread, nothread)
• Run with -help to see the avialable options with threads.
• A few tutorial examples are located at $PETSC_DIR/src/sys/threadcomm/examples/tutorials
Also LU is a direct solver that is not threaded so using threads for this exact run will not help (much) at all. The threads will only show useful speed up for iterative methods.
Barry
As time goes by we hope to have more extensive support in more routines for threads but things like factorization and solve are difficult so out side help would be very useful.
On Mar 6, 2013, at 3:39 AM, Åsmund Ervik <Asmund.Ervik at sintef.no> wrote:
> Hi again,
>
> On 01. mars 2013 20:06, Jed Brown wrote:
>>
>> Matrix and vector operations are probably running in parallel, but probably
>> not the operations that are taking time. Always send -log_summary if you
>> have a performance question.
>>
>
> I don't think they are running in parallel. When I analyze my code in
> Intel Vtune Amplifier, the only routines running in parallel are my own
> OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my
> code, it never uses more than one thread.
>
> -log_summary is shown below; this is using -pc_type lu -ksp_type bcgs.
> The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used
> LU instead here in order to limit the test to PETSc only. The summary
> agrees with Vtune that MatLUFactorNumeric is the most time-consuming
> routine; in general it seems that the PC is always the most time-consuming.
>
> Any advice on how to get OpenMP working?
>
> Regards,
> Åsmund
>
>
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder
> Wed Mar 6 10:14:55 2013
> Using Petsc Development HG revision:
> 58cc6199509f1642f637843f1ca468283bf5ced9 HG Date: Wed Jan 30 00:39:35
> 2013 -0600
>
> Max Max/Min Avg Total
> Time (sec): 4.446e+02 1.00000 4.446e+02
> Objects: 2.017e+03 1.00000 2.017e+03
> Flops: 3.919e+11 1.00000 3.919e+11 3.919e+11
> Flops/sec: 8.815e+08 1.00000 8.815e+08 8.815e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 2.818e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length
> N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.4460e+02 100.0% 3.9191e+11 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 2.817e+03 100.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot 802 1.0 9.2811e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2117
> VecDotNorm2 401 1.0 7.1333e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
> 4.0e+02 0 0 0 0 14 0 0 0 0 14 2755
> VecNorm 1203 1.0 7.8265e-02 1.0 2.95e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 3766
> VecCopy 802 1.0 1.1754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1211 1.0 9.9961e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 401 1.0 4.5847e-02 1.0 9.82e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2143
> VecAXPBYCZ 802 1.0 1.3489e-01 1.0 3.93e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2913
> VecWAXPY 802 1.0 1.2292e-01 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1599
> VecAssemblyBegin 802 1.0 2.4509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 802 1.0 6.7234e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatMult 1203 1.0 1.1513e+00 1.0 1.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1149
> MatSolve 1604 1.0 1.4714e+01 1.0 2.07e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 5 0 0 0 3 5 0 0 0 1405
> MatLUFactorSym 401 1.0 4.0197e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.2e+03 9 0 0 0 43 9 0 0 0 43 0
> MatLUFactorNum 401 1.0 2.3728e+02 1.0 3.69e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 53 94 0 0 0 53 94 0 0 0 1553
> MatAssemblyBegin 401 1.0 1.7977e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 401 1.0 3.1975e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 401 1.0 9.1545e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 401 1.0 2.0361e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 8.0e+02 5 0 0 0 28 5 0 0 0 28 0
> KSPSetUp 401 1.0 4.1821e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 401 1.0 3.1511e+02 1.0 3.92e+11 1.0 0.0e+00 0.0e+00
> 2.8e+03 71100 0 0100 71100 0 0100 1244
> PCSetUp 401 1.0 2.9844e+02 1.0 3.69e+11 1.0 0.0e+00 0.0e+00
> 2.0e+03 67 94 0 0 71 67 94 0 0 71 1235
> PCApply 1604 1.0 1.4717e+01 1.0 2.07e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 3 5 0 0 0 3 5 0 0 0 1405
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 409 409 401422048 0
> Matrix 402 402 31321054412 0
> Krylov Solver 1 1 1128 0
> Preconditioner 1 1 1152 0
> Index Set 1203 1203 393903904 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -ksp_type bcgs
> -log_summary
> -pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Fri Mar 1 12:53:06 2013
> Configure options: --with-pthreadclasses --with-openmp
> --with-debugging=0 --with-shared-libraries=1 --download-mpich
> --download-hypre --with-boost-dir=/usr COPTFLAGS=-O3 FOPTFLAGS=-O3
> -----------------------------------------
> Libraries compiled on Fri Mar 1 12:53:06 2013 on vsl161
> Machine characteristics: Linux-3.7.9-1-ARCH-x86_64-with-glibc2.2.5
> Using PETSc directory: /opt/petsc/petsc-dev-install
> Using PETSc arch: arch-linux2-c-opt
> -----------------------------------------
>
> Using C compiler:
> /opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpicc -fPIC -Wall
> -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -fopenmp
> ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler:
> /opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpif90 -fPIC -Wall
> -Wno-unused-variable -Wno-unused-dummy-argument -O3 -fopenmp
> ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths:
> -I/opt/petsc/petsc-dev-install/arch-linux2-c-opt/include
> -I/opt/petsc/petsc-dev-install/include
> -I/opt/petsc/petsc-dev-install/include
> -I/opt/petsc/petsc-dev-install/arch-linux2-c-opt/include -I/usr/include
> -----------------------------------------
>
> Using C linker: /opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpicc
> Using Fortran linker:
> /opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpif90
> Using libraries:
> -Wl,-rpath,/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib
> -L/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib -lpetsc
> -Wl,-rpath,/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib
> -L/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib -lHYPRE
> -Wl,-rpath,/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.2
> -L/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.2
> -Wl,-rpath,/opt/intel/composer_xe_2013.1.117/compiler/lib/intel64
> -L/opt/intel/composer_xe_2013.1.117/compiler/lib/intel64
> -Wl,-rpath,/opt/intel/composer_xe_2013.1.117/ipp/lib/intel64
> -L/opt/intel/composer_xe_2013.1.117/ipp/lib/intel64
> -Wl,-rpath,/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
> -L/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
> -Wl,-rpath,/opt/intel/composer_xe_2013.1.117/tbb/lib/intel64
> -L/opt/intel/composer_xe_2013.1.117/tbb/lib/intel64 -lmpichcxx -lstdc++
> -llapack -lblas -lX11 -lpthread -lm -lmpichf90 -lgfortran -lm -lgfortran
> -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt
> -lgcc_s -ldl
> -----------------------------------------
>
>
>
>
>
>
More information about the petsc-dev
mailing list