[petsc-users] Tweaking my code for CUDA

Manuel Valera mvalera-w at mail.sdsu.edu
Thu Mar 8 12:05:33 CST 2018


Hello all,

I am working on porting a linear solver into GPUs for timing purposes, so
far i've been able to compile and run the CUSP libraries and compile PETSc
to be used with CUSP and ViennaCL, after the initial runs i noticed some
errors, they are different for different flags and i would appreciate any
help interpreting them,

The only elements in this program that use PETSc are the laplacian matrix
(sparse), the RHS and X vectors and a scatter petsc object, so i would say
it's safe to pass the command line arguments for the Mat/VecSetType()s
instead of changing the source code,

If i use *-vec_type cuda -mat_type aijcusparse* or *-vec_type viennacl
-mat_type aijviennacl *i get the following:


[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] VecSetValues line 847
/home/valera/petsc/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: [0] VecSetType line 36
/home/valera/petsc/src/vec/vec/interface/vecreg.c
[0]PETSC ERROR: [0] VecSetTypeFromOptions_Private line 1230
/home/valera/petsc/src/vec/vec/interface/vector.c
[0]PETSC ERROR: [0] VecSetFromOptions line 1271
/home/valera/petsc/src/vec/vec/interface/vector.c
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
Date: 2018-02-28 10:19:08 -0600
[0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar  8
09:50:51 2018
[0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
--with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
--FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
--with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
--with-vienacl=1 --download-fblaslapack=1 --download-hypre
[0]PETSC ERROR: #5 User provided function() line 0 in  unknown file
--------------------------------------------------------------------------

This seems to be a memory out of range, maybe my vector is too big for my
CUDA system? how do i assess that?


Next, if i use *-vec_type cusp -mat_type aijcusparse *i get something
different and more interesting:


[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR:  Vec is locked read only, argument # 3
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
Date: 2018-02-28 10:19:08 -0600
[0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar  8
10:02:19 2018
[0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
--with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
--FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
--with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
--with-vienacl=1 --download-fblaslapack=1 --download-hypre
[0]PETSC ERROR: #48 KSPSolve() line 615 in
/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
 PETSC_SOLVER_ONLY   6.8672990892082453E-005 s
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Object (seq) is not seqcusp or mpicusp
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
Date: 2018-02-28 10:19:08 -0600
[0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar  8
10:02:19 2018
[0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
--with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
--FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
--with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
--with-vienacl=1 --download-fblaslapack=1 --download-hypre
[0]PETSC ERROR: #49 VecCUSPGetArrayReadWrite() line 1718 in
/home/valera/petsc/src/vec/vec/impls/seq/seqcusp/veccusp2.cu
[0]PETSC ERROR: #50 VecScatterCUSP_StoS() line 269 in
/home/valera/petsc/src/vec/vec/impls/seq/seqcusp/vecscattercusp.cu





And it yields a "solution" to the system and also a log at the end:





./gcmSeamount on a cuda named node50 with 1 processor, by valera Thu Mar  8
10:02:24 2018
Using Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT Date:
2018-02-28 10:19:08 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           4.573e+00      1.00000   4.573e+00
Objects:              8.100e+01      1.00000   8.100e+01
Flop:                 3.492e+07      1.00000   3.492e+07  3.492e+07
Flop/sec:            7.637e+06      1.00000   7.637e+06  7.637e+06
Memory:               2.157e+08      1.00000              2.157e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flop
                            and VecAXPY() for complex vectors of length N
--> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
 %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 4.5729e+00 100.0%  3.4924e+07 100.0%  0.000e+00
 0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop
     --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatLUFactorNum         1 1.0 4.9502e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
0.0e+00  1100  0  0  0   1100  0  0  0   706
MatILUFactorSym        1 1.0 1.9642e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.6612e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  6  0  0  0  0   6  0  0  0  0     0
MatGetRowIJ            1 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.7186e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                1 1.0 1.1575e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatView                1 1.0 8.0877e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatCUSPCopyTo          1 1.0 2.4664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  5  0  0  0  0   5  0  0  0  0     0
VecSet                68 1.0 5.1665e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAssemblyBegin      17 1.0 5.2691e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        17 1.0 4.3631e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       15 1.0 1.5345e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecCUSPCopyFrom        1 1.0 1.1199e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 5.1929e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCSetUp                2 1.0 8.6590e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
0.0e+00  2100  0  0  0   2100  0  0  0   403
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              1     52856972     0.
   Matrix Null Space     1              1          608     0.
              Vector    66              3      3414600     0.
      Vector Scatter     1              1          680     0.
              Viewer     3              2         1680     0.
       Krylov Solver     1              0            0     0.
      Preconditioner     2              1          864     0.
           Index Set     4              1          800     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-ksp_view
-log_view
-mat_type aijcusparse
-matload_block_size 1
-vec_type cusp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: PETSC_ARCH=cuda --with-cc=mpicc --with-cxx=mpic++
--with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
--with-shared-libraries=1 --with-debugging=1 --with-cuda=1
--with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
--with-vienacl=1 --download-fblaslapack=1 --download-hypre
-----------------------------------------
Libraries compiled on Mon Mar  5 16:37:18 2018 on node50
Machine characteristics:
Linux-3.10.0-693.17.1.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: cuda
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3
Using Fortran compiler: mpifort  -fPIC -Wall -ffree-line-length-0
-Wno-unused-dummy-argument -O3
-----------------------------------------

Using include paths: -I/home/valera/petsc/cuda/include
-I/home/valera/petsc/include -I/home/valera/petsc/include
-I/home/valera/petsc/cuda/include -I/home/valera/cusp/
-I/usr/local/cuda/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpifort
Using libraries: -Wl,-rpath,/home/valera/petsc/cuda/lib
-L/home/valera/petsc/cuda/lib -lpetsc
-Wl,-rpath,/home/valera/petsc/cuda/lib -L/home/valera/petsc/cuda/lib
-Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64
-Wl,-rpath,/usr/lib64/openmpi/lib -L/usr/lib64/openmpi/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lHYPRE -lflapack -lfblas -lm
-lcufft -lcublas -lcudart -lcusparse -lX11 -lstdc++ -ldl -lmpi_usempi
-lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
-lpthread -lstdc++ -ldl
-----------------------------------------



Thanks for your help,

Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180308/b9da9784/attachment.html>


More information about the petsc-users mailing list