[petsc-users] Tweaking my code for CUDA
Matthew Knepley
knepley at gmail.com
Sun Mar 11 11:00:02 CDT 2018
On Fri, Mar 9, 2018 at 3:05 AM, Manuel Valera <mvalera-w at mail.sdsu.edu>
wrote:
> Hello all,
>
> I am working on porting a linear solver into GPUs for timing purposes, so
> far i've been able to compile and run the CUSP libraries and compile PETSc
> to be used with CUSP and ViennaCL, after the initial runs i noticed some
> errors, they are different for different flags and i would appreciate any
> help interpreting them,
>
> The only elements in this program that use PETSc are the laplacian matrix
> (sparse), the RHS and X vectors and a scatter petsc object, so i would say
> it's safe to pass the command line arguments for the Mat/VecSetType()s
> instead of changing the source code,
>
> If i use *-vec_type cuda -mat_type aijcusparse* or *-vec_type viennacl
> -mat_type aijviennacl *i get the following:
>
These systems do not properly propagate errors. My only advice is to run a
smaller problem and see.
> [0]PETSC ERROR: ------------------------------
> ------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: INSTEAD the line number of the start of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: [0] VecSetValues line 847 /home/valera/petsc/src/vec/
> vec/interface/rvector.c
> [0]PETSC ERROR: [0] VecSetType line 36 /home/valera/petsc/src/vec/
> vec/interface/vecreg.c
> [0]PETSC ERROR: [0] VecSetTypeFromOptions_Private line 1230
> /home/valera/petsc/src/vec/vec/interface/vector.c
> [0]PETSC ERROR: [0] VecSetFromOptions line 1271 /home/valera/petsc/src/vec/
> vec/interface/vector.c
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar 8
> 09:50:51 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #5 User provided function() line 0 in unknown file
> --------------------------------------------------------------------------
>
> This seems to be a memory out of range, maybe my vector is too big for my
> CUDA system? how do i assess that?
>
>
> Next, if i use *-vec_type cusp -mat_type aijcusparse *i get something
> different and more interesting:
>
We need to see the entire error message, since it has the stack.
This seems like a logic error, but could definitely be on our end. Here is
how I think about these:
1) We have nightly test solves, so at least some solver configuration
works
2) Some vector which is marked read-only (happens for input to solvers),
but someone is trying to update it.
The stack will tell me where this is happening.
Thanks,
Matt
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Vec is locked read only, argument # 3
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar 8
> 10:02:19 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #48 KSPSolve() line 615 in /home/valera/petsc/src/ksp/
> ksp/interface/itfunc.c
> PETSC_SOLVER_ONLY 6.8672990892082453E-005 s
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Object (seq) is not seqcusp or mpicusp
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar 8
> 10:02:19 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #49 VecCUSPGetArrayReadWrite() line 1718 in
> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/veccusp2.cu
> [0]PETSC ERROR: #50 VecScatterCUSP_StoS() line 269 in
> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/vecscattercusp.cu
>
>
>
>
>
> And it yields a "solution" to the system and also a log at the end:
>
>
>
>
>
> ./gcmSeamount on a cuda named node50 with 1 processor, by valera Thu Mar
> 8 10:02:24 2018
> Using Petsc Development GIT revision: v3.8.3-1817-g96b6f8a GIT Date:
> 2018-02-28 10:19:08 -0600
>
> Max Max/Min Avg Total
> Time (sec): 4.573e+00 1.00000 4.573e+00
> Objects: 8.100e+01 1.00000 8.100e+01
> Flop: 3.492e+07 1.00000 3.492e+07 3.492e+07
> Flop/sec: 7.637e+06 1.00000 7.637e+06 7.637e+06
> Memory: 2.157e+08 1.00000 2.157e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flop
> and VecAXPY() for complex vectors of length N
> --> 8N flop
>
> Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.5729e+00 100.0% 3.4924e+07 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flop: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flop in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
> ------------------------------------------------------------
> ------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option, #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
> Event Count Time (sec) Flop
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatLUFactorNum 1 1.0 4.9502e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1100 0 0 0 1100 0 0 0 706
> MatILUFactorSym 1 1.0 1.9642e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 2.6612e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 6 0 0 0 0 6 0 0 0 0 0
> MatGetRowIJ 1 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.7186e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatLoad 1 1.0 1.1575e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
> MatView 1 1.0 8.0877e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> MatCUSPCopyTo 1 1.0 2.4664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 5 0 0 0 0 5 0 0 0 0 0
> VecSet 68 1.0 5.1665e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecAssemblyBegin 17 1.0 5.2691e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 17 1.0 4.3631e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 15 1.0 1.5345e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyFrom 1 1.0 1.1199e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 1 1.0 5.1929e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> PCSetUp 2 1.0 8.6590e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 2100 0 0 0 2100 0 0 0 403
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Matrix 3 1 52856972 0.
> Matrix Null Space 1 1 608 0.
> Vector 66 3 3414600 0.
> Vector Scatter 1 1 680 0.
> Viewer 3 2 1680 0.
> Krylov Solver 1 0 0 0.
> Preconditioner 2 1 864 0.
> Index Set 4 1 800 0.
> ============================================================
> ============================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -ksp_view
> -log_view
> -mat_type aijcusparse
> -matload_block_size 1
> -vec_type cusp
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: PETSC_ARCH=cuda --with-cc=mpicc --with-cxx=mpic++
> --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
> --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> -----------------------------------------
> Libraries compiled on Mon Mar 5 16:37:18 2018 on node50
> Machine characteristics: Linux-3.10.0-693.17.1.el7.x86_
> 64-x86_64-with-centos-7.2.1511-Core
> Using PETSc directory: /home/valera/petsc
> Using PETSc arch: cuda
> -----------------------------------------
>
> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3
> Using Fortran compiler: mpifort -fPIC -Wall -ffree-line-length-0
> -Wno-unused-dummy-argument -O3
> -----------------------------------------
>
> Using include paths: -I/home/valera/petsc/cuda/include
> -I/home/valera/petsc/include -I/home/valera/petsc/include
> -I/home/valera/petsc/cuda/include -I/home/valera/cusp/
> -I/usr/local/cuda/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpifort
> Using libraries: -Wl,-rpath,/home/valera/petsc/cuda/lib
> -L/home/valera/petsc/cuda/lib -lpetsc -Wl,-rpath,/home/valera/petsc/cuda/lib
> -L/home/valera/petsc/cuda/lib -Wl,-rpath,/usr/local/cuda/lib64
> -L/usr/local/cuda/lib64 -Wl,-rpath,/usr/lib64/openmpi/lib
> -L/usr/lib64/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lHYPRE -lflapack -lfblas -lm
> -lcufft -lcublas -lcudart -lcusparse -lX11 -lstdc++ -ldl -lmpi_usempi
> -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
> -lpthread -lstdc++ -ldl
> -----------------------------------------
>
>
>
> Thanks for your help,
>
> Manuel
>
>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180312/10e458f0/attachment-0001.html>
More information about the petsc-users
mailing list