[petsc-users] Tweaking my code for CUDA

Matthew Knepley knepley at gmail.com
Sun Mar 11 11:00:02 CDT 2018


On Fri, Mar 9, 2018 at 3:05 AM, Manuel Valera <mvalera-w at mail.sdsu.edu>
wrote:

> Hello all,
>
> I am working on porting a linear solver into GPUs for timing purposes, so
> far i've been able to compile and run the CUSP libraries and compile PETSc
> to be used with CUSP and ViennaCL, after the initial runs i noticed some
> errors, they are different for different flags and i would appreciate any
> help interpreting them,
>
> The only elements in this program that use PETSc are the laplacian matrix
> (sparse), the RHS and X vectors and a scatter petsc object, so i would say
> it's safe to pass the command line arguments for the Mat/VecSetType()s
> instead of changing the source code,
>
> If i use *-vec_type cuda -mat_type aijcusparse* or *-vec_type viennacl
> -mat_type aijviennacl *i get the following:
>

These systems do not properly propagate errors. My only advice is to run a
smaller problem and see.


> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] VecSetValues line 847 /home/valera/petsc/src/vec/
> vec/interface/rvector.c
> [0]PETSC ERROR: [0] VecSetType line 36 /home/valera/petsc/src/vec/
> vec/interface/vecreg.c
> [0]PETSC ERROR: [0] VecSetTypeFromOptions_Private line 1230
> /home/valera/petsc/src/vec/vec/interface/vector.c
> [0]PETSC ERROR: [0] VecSetFromOptions line 1271 /home/valera/petsc/src/vec/
> vec/interface/vector.c
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar  8
> 09:50:51 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #5 User provided function() line 0 in  unknown file
> --------------------------------------------------------------------------
>
> This seems to be a memory out of range, maybe my vector is too big for my
> CUDA system? how do i assess that?
>
>
> Next, if i use *-vec_type cusp -mat_type aijcusparse *i get something
> different and more interesting:
>

We need to see the entire error message, since it has the stack.

This seems like a logic error, but could definitely be on our end. Here is
how I think about these:

  1) We have nightly test solves, so at least some solver configuration
works

  2) Some vector which is marked read-only (happens for input to solvers),
but someone is trying to update it.
      The stack will tell me where this is happening.

  Thanks,

     Matt


> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR:  Vec is locked read only, argument # 3
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar  8
> 10:02:19 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #48 KSPSolve() line 615 in /home/valera/petsc/src/ksp/
> ksp/interface/itfunc.c
>  PETSC_SOLVER_ONLY   6.8672990892082453E-005 s
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Object (seq) is not seqcusp or mpicusp
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar  8
> 10:02:19 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #49 VecCUSPGetArrayReadWrite() line 1718 in
> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/veccusp2.cu
> [0]PETSC ERROR: #50 VecScatterCUSP_StoS() line 269 in
> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/vecscattercusp.cu
>
>
>
>
>
> And it yields a "solution" to the system and also a log at the end:
>
>
>
>
>
> ./gcmSeamount on a cuda named node50 with 1 processor, by valera Thu Mar
> 8 10:02:24 2018
> Using Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT Date:
> 2018-02-28 10:19:08 -0600
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           4.573e+00      1.00000   4.573e+00
> Objects:              8.100e+01      1.00000   8.100e+01
> Flop:                 3.492e+07      1.00000   3.492e+07  3.492e+07
> Flop/sec:            7.637e+06      1.00000   7.637e+06  7.637e+06
> Memory:               2.157e+08      1.00000              2.157e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flop
>                             and VecAXPY() for complex vectors of length N
> --> 8N flop
>
> Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
>  %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 4.5729e+00 100.0%  3.4924e+07 100.0%  0.000e+00
>  0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
> ------------------------------------------------------------
> ------------------------------------------------------------
>
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run ./configure                #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
> Event                Count      Time (sec)     Flop
>      --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatLUFactorNum         1 1.0 4.9502e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1100  0  0  0   1100  0  0  0   706
> MatILUFactorSym        1 1.0 1.9642e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         2 1.0 2.6612e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
> MatGetRowIJ            1 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.7186e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatLoad                1 1.0 1.1575e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> MatView                1 1.0 8.0877e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> MatCUSPCopyTo          1 1.0 2.4664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
> VecSet                68 1.0 5.1665e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAssemblyBegin      17 1.0 5.2691e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd        17 1.0 4.3631e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin       15 1.0 1.5345e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecCUSPCopyFrom        1 1.0 1.1199e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 5.1929e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> PCSetUp                2 1.0 8.6590e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  2100  0  0  0   2100  0  0  0   403
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     3              1     52856972     0.
>    Matrix Null Space     1              1          608     0.
>               Vector    66              3      3414600     0.
>       Vector Scatter     1              1          680     0.
>               Viewer     3              2         1680     0.
>        Krylov Solver     1              0            0     0.
>       Preconditioner     2              1          864     0.
>            Index Set     4              1          800     0.
> ============================================================
> ============================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -ksp_view
> -log_view
> -mat_type aijcusparse
> -matload_block_size 1
> -vec_type cusp
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: PETSC_ARCH=cuda --with-cc=mpicc --with-cxx=mpic++
> --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
> --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> -----------------------------------------
> Libraries compiled on Mon Mar  5 16:37:18 2018 on node50
> Machine characteristics: Linux-3.10.0-693.17.1.el7.x86_
> 64-x86_64-with-centos-7.2.1511-Core
> Using PETSc directory: /home/valera/petsc
> Using PETSc arch: cuda
> -----------------------------------------
>
> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3
> Using Fortran compiler: mpifort  -fPIC -Wall -ffree-line-length-0
> -Wno-unused-dummy-argument -O3
> -----------------------------------------
>
> Using include paths: -I/home/valera/petsc/cuda/include
> -I/home/valera/petsc/include -I/home/valera/petsc/include
> -I/home/valera/petsc/cuda/include -I/home/valera/cusp/
> -I/usr/local/cuda/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpifort
> Using libraries: -Wl,-rpath,/home/valera/petsc/cuda/lib
> -L/home/valera/petsc/cuda/lib -lpetsc -Wl,-rpath,/home/valera/petsc/cuda/lib
> -L/home/valera/petsc/cuda/lib -Wl,-rpath,/usr/local/cuda/lib64
> -L/usr/local/cuda/lib64 -Wl,-rpath,/usr/lib64/openmpi/lib
> -L/usr/lib64/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lHYPRE -lflapack -lfblas -lm
> -lcufft -lcublas -lcudart -lcusparse -lX11 -lstdc++ -ldl -lmpi_usempi
> -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
> -lpthread -lstdc++ -ldl
> -----------------------------------------
>
>
>
> Thanks for your help,
>
> Manuel
>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180312/10e458f0/attachment-0001.html>


More information about the petsc-users mailing list