[petsc-dev] SNES ex19 not using GPU despite passing the options

Mani Chandra mc0710 at gmail.com
Mon Jan 13 21:28:44 CST 2014


Hi,

I tried to run SNES ex19 with the following options but it does not seem to
use my GPU. See the log summary attached. Am I interpretting the log
summary wrong? I don't see any CUSP calls to copy data from the CPU to the
GPU.

/ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode
-preload off  -cusp_synchronize -cuda_s
et_device 0

I get the following output when I do -cuda_show_devices
CUDA device 0: Quadro FX 1800M

Cheers,
Mani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140113/c7c819bb/attachment.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ex19 on a arch-linux2-c-debug named Deathstar with 1 processor, by mc Mon Jan 13 18:23:03 2014
Using Petsc Development GIT revision: v3.4.3-2308-gdf38795  GIT Date: 2014-01-13 15:21:37 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           1.711e+02      1.00000   1.711e+02
Objects:              9.300e+01      1.00000   9.300e+01
Flops:                3.868e+11      1.00000   3.868e+11  3.868e+11
Flops/sec:            2.260e+09      1.00000   2.260e+09  2.260e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       1.250e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.7111e+02 100.0%  3.8676e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  1.240e+02  99.2%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

SNESSolve              1 1.0 1.7075e+02 1.0 3.87e+11 1.0 0.0e+00 0.0e+00 1.0e+02100100  0  0 81 100100  0  0 81  2265
SNESFunctionEval       1 1.0 4.6270e-03 1.0 7.56e+06 1.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  2   0  0  0  0  2  1634
SNESJacobianEval       1 1.0 4.2646e-01 1.0 1.74e+08 1.0 0.0e+00 0.0e+00 3.1e+01  0  0  0  0 25   0  0  0  0 25   408
VecMDot            10000 1.0 3.3502e+01 1.0 1.12e+11 1.0 0.0e+00 0.0e+00 0.0e+00 20 29  0  0  0  20 29  0  0  0  3329
VecNorm            10335 1.0 3.0934e+00 1.0 7.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  2406
VecScale           10334 1.0 1.7706e+00 1.0 3.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2101
VecCopy            10688 1.0 2.7695e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecSet               379 1.0 7.1906e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              687 1.0 2.3886e-01 1.0 4.95e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2071
VecMAXPY           10334 1.0 2.6452e+01 1.0 1.19e+11 1.0 0.0e+00 0.0e+00 0.0e+00 15 31  0  0  0  15 31  0  0  0  4488
VecScatterBegin       22 1.0 2.0290e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize       10334 1.0 4.8779e+00 1.0 1.12e+10 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0  2288
MatMult            10333 1.0 1.0230e+02 1.0 1.45e+11 1.0 0.0e+00 0.0e+00 0.0e+00 60 37  0  0  0  60 37  0  0  0  1414
MatAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.0108e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 4.2181e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatFDColorCreate       1 1.0 9.2912e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  2   0  0  0  0  2     0
MatFDColorSetUp        1 1.0 2.7306e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01  0  0  0  0 20   0  0  0  0 20     0
MatFDColorApply        1 1.0 1.4697e-01 1.0 1.74e+08 1.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  2   0  0  0  0  2  1183
MatFDColorFunc        21 1.0 9.5552e-02 1.0 1.59e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1662
KSPGMRESOrthog     10000 1.0 5.8308e+01 1.0 2.23e+11 1.0 0.0e+00 0.0e+00 0.0e+00 34 58  0  0  0  34 58  0  0  0  3825
KSPSetUp               1 1.0 2.3439e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  8   0  0  0  0  8     0
KSPSolve               1 1.0 1.7032e+02 1.0 3.87e+11 1.0 0.0e+00 0.0e+00 6.8e+01100100  0  0 54 100100  0  0 55  2270
PCSetUp                1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply            10334 1.0 2.6863e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

                SNES     1              1         1280     0
      SNESLineSearch     1              1          820     0
              DMSNES     1              1          680     0
              Vector    48             48     64871552     0
      Vector Scatter     3              3         1956     0
              Matrix     1              1     63209092     0
  Matrix FD Coloring     1              1    145133788     0
    Distributed Mesh     1              1       364968     0
     Bipartite Graph     2              2         1632     0
           Index Set    27             27      1820936     0
   IS L to G Mapping     3              3         1788     0
       Krylov Solver     1              1        10164     0
     DMKSP interface     1              1          664     0
      Preconditioner     1              1          808     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-cuda_set_device 0
-cusp_synchronize
-da_grid_x 300
-da_grid_y 300
-da_mat_type mpiaijcusp
-da_vec_type mpicusp
-dmmg_nlevels 1
-log_summary
-mat_no_inode
-pc_type none
-preload off
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with single precision PetscScalar and PetscReal
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 4 sizeof(PetscInt) 4
Configure run at: Mon Jan 13 17:26:35 2014
Configure options: --prefix=/home/mc/Downloads/petsc_float_optimized/ --with-clean=1 --with-precision=single --with-cuda=1 --with-cuda-only=0 --with-cusp=1 --with-thrust=1 --with-opencl=1 --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-cusp-dir=/home/mc/Downloads/cusplibrary --with-cuda-dir=/opt/cuda --download-txpetscgpu=yes
-----------------------------------------
Libraries compiled on Mon Jan 13 17:26:35 2014 on Deathstar
Machine characteristics: Linux-3.12.7-1-ARCH-x86_64-with-glibc2.2.5
Using PETSc directory: /home/mc/Downloads/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/home/mc/Downloads/petsc/arch-linux2-c-debug/include -I/home/mc/Downloads/petsc/include -I/home/mc/Downloads/petsc/include -I/home/mc/Downloads/petsc/arch-linux2-c-debug/include -I/opt/cuda/include -I/home/mc/Downloads/cusplibrary/ -I/home/mc/Downloads/cusplibrary/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/mc/Downloads/petsc/arch-linux2-c-debug/lib -L/home/mc/Downloads/petsc/arch-linux2-c-debug/lib -lpetsc -llapack -lblas -lX11 -lpthread -Wl,-rpath,/opt/cuda/lib64 -L/opt/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse -lOpenCL -lm -Wl,-rpath,/usr/lib/openmpi -L/usr/lib/openmpi -Wl,-rpath,/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -L/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 -Wl,-rpath,/opt/intel/composerxe/compiler/lib/intel64 -L/opt/intel/composerxe/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composerxe/ipp/lib/intel64 -L/opt/intel/composerxe/ipp/lib/intel64 -Wl,-rpath,/opt/intel/composerxe/mkl/lib/intel64 -L/opt/intel/composerxe/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composerxe/tbb/lib/intel64/gcc4.4 -L/opt/intel/composerxe/tbb/lib/intel64/gcc4.4 -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl
-----------------------------------------


More information about the petsc-dev mailing list