[petsc-dev] SNES ex19 not using GPU despite passing the options
Mani Chandra
mc0710 at gmail.com
Mon Jan 13 21:28:44 CST 2014
Hi,
I tried to run SNES ex19 with the following options but it does not seem to
use my GPU. See the log summary attached. Am I interpretting the log
summary wrong? I don't see any CUSP calls to copy data from the CPU to the
GPU.
/ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_s
et_device 0
I get the following output when I do -cuda_show_devices
CUDA device 0: Quadro FX 1800M
Cheers,
Mani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140113/c7c819bb/attachment.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex19 on a arch-linux2-c-debug named Deathstar with 1 processor, by mc Mon Jan 13 18:23:03 2014
Using Petsc Development GIT revision: v3.4.3-2308-gdf38795 GIT Date: 2014-01-13 15:21:37 -0600
Max Max/Min Avg Total
Time (sec): 1.711e+02 1.00000 1.711e+02
Objects: 9.300e+01 1.00000 9.300e+01
Flops: 3.868e+11 1.00000 3.868e+11 3.868e+11
Flops/sec: 2.260e+09 1.00000 2.260e+09 2.260e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 1.250e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.7111e+02 100.0% 3.8676e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 1.240e+02 99.2%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
SNESSolve 1 1.0 1.7075e+02 1.0 3.87e+11 1.0 0.0e+00 0.0e+00 1.0e+02100100 0 0 81 100100 0 0 81 2265
SNESFunctionEval 1 1.0 4.6270e-03 1.0 7.56e+06 1.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 2 0 0 0 0 2 1634
SNESJacobianEval 1 1.0 4.2646e-01 1.0 1.74e+08 1.0 0.0e+00 0.0e+00 3.1e+01 0 0 0 0 25 0 0 0 0 25 408
VecMDot 10000 1.0 3.3502e+01 1.0 1.12e+11 1.0 0.0e+00 0.0e+00 0.0e+00 20 29 0 0 0 20 29 0 0 0 3329
VecNorm 10335 1.0 3.0934e+00 1.0 7.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 2406
VecScale 10334 1.0 1.7706e+00 1.0 3.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2101
VecCopy 10688 1.0 2.7695e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecSet 379 1.0 7.1906e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 687 1.0 2.3886e-01 1.0 4.95e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2071
VecMAXPY 10334 1.0 2.6452e+01 1.0 1.19e+11 1.0 0.0e+00 0.0e+00 0.0e+00 15 31 0 0 0 15 31 0 0 0 4488
VecScatterBegin 22 1.0 2.0290e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 10334 1.0 4.8779e+00 1.0 1.12e+10 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 2288
MatMult 10333 1.0 1.0230e+02 1.0 1.45e+11 1.0 0.0e+00 0.0e+00 0.0e+00 60 37 0 0 0 60 37 0 0 0 1414
MatAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.0108e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 4.2181e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorCreate 1 1.0 9.2912e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 2 0 0 0 0 2 0
MatFDColorSetUp 1 1.0 2.7306e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 20 0 0 0 0 20 0
MatFDColorApply 1 1.0 1.4697e-01 1.0 1.74e+08 1.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 2 0 0 0 0 2 1183
MatFDColorFunc 21 1.0 9.5552e-02 1.0 1.59e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1662
KSPGMRESOrthog 10000 1.0 5.8308e+01 1.0 2.23e+11 1.0 0.0e+00 0.0e+00 0.0e+00 34 58 0 0 0 34 58 0 0 0 3825
KSPSetUp 1 1.0 2.3439e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 8 0 0 0 0 8 0
KSPSolve 1 1.0 1.7032e+02 1.0 3.87e+11 1.0 0.0e+00 0.0e+00 6.8e+01100100 0 0 54 100100 0 0 55 2270
PCSetUp 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10334 1.0 2.6863e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
SNES 1 1 1280 0
SNESLineSearch 1 1 820 0
DMSNES 1 1 680 0
Vector 48 48 64871552 0
Vector Scatter 3 3 1956 0
Matrix 1 1 63209092 0
Matrix FD Coloring 1 1 145133788 0
Distributed Mesh 1 1 364968 0
Bipartite Graph 2 2 1632 0
Index Set 27 27 1820936 0
IS L to G Mapping 3 3 1788 0
Krylov Solver 1 1 10164 0
DMKSP interface 1 1 664 0
Preconditioner 1 1 808 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-cuda_set_device 0
-cusp_synchronize
-da_grid_x 300
-da_grid_y 300
-da_mat_type mpiaijcusp
-da_vec_type mpicusp
-dmmg_nlevels 1
-log_summary
-mat_no_inode
-pc_type none
-preload off
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with single precision PetscScalar and PetscReal
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 4 sizeof(PetscInt) 4
Configure run at: Mon Jan 13 17:26:35 2014
Configure options: --prefix=/home/mc/Downloads/petsc_float_optimized/ --with-clean=1 --with-precision=single --with-cuda=1 --with-cuda-only=0 --with-cusp=1 --with-thrust=1 --with-opencl=1 --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-cusp-dir=/home/mc/Downloads/cusplibrary --with-cuda-dir=/opt/cuda --download-txpetscgpu=yes
-----------------------------------------
Libraries compiled on Mon Jan 13 17:26:35 2014 on Deathstar
Machine characteristics: Linux-3.12.7-1-ARCH-x86_64-with-glibc2.2.5
Using PETSc directory: /home/mc/Downloads/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/mc/Downloads/petsc/arch-linux2-c-debug/include -I/home/mc/Downloads/petsc/include -I/home/mc/Downloads/petsc/include -I/home/mc/Downloads/petsc/arch-linux2-c-debug/include -I/opt/cuda/include -I/home/mc/Downloads/cusplibrary/ -I/home/mc/Downloads/cusplibrary/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/mc/Downloads/petsc/arch-linux2-c-debug/lib -L/home/mc/Downloads/petsc/arch-linux2-c-debug/lib -lpetsc -llapack -lblas -lX11 -lpthread -Wl,-rpath,/opt/cuda/lib64 -L/opt/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse -lOpenCL -lm -Wl,-rpath,/usr/lib/openmpi -L/usr/lib/openmpi -Wl,-rpath,/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -L/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -Wl,-rpath,/opt/intel/composerxe/compiler/lib/intel64 -L/opt/intel/composerxe/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composerxe/ipp/lib/intel64 -L/opt/intel/composerxe/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composerxe/mkl/lib/intel64 -L/opt/intel/composerxe/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composerxe/tbb/lib/intel64/gcc4.4 -L/opt/intel/composerxe/tbb/lib/intel64/gcc4.4 -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl
-----------------------------------------
More information about the petsc-dev
mailing list