[petsc-users] Tweaking my code for CUDA

Matthew Knepley knepley at gmail.com
Wed Mar 14 12:39:35 CDT 2018


On Thu, Mar 15, 2018 at 2:27 AM, Manuel Valera <mvalera-w at mail.sdsu.edu>
wrote:

> Ok thanks Matt, i made a smaller case with only the linear solver and a
> 25x25 matrix, the error i have in this case is:
>

Ah, it appears that not all parts of your problem are taking the type
options. If you want the
linear algebra objects to change type, you need to have

  VecSetFromOptions()  and MatSetFromOptions()

called after you create them, but before sizes are set and data is entered.
However, it should
not be possible to have a seq Vec with the seqcusp AXPY routine set.
Something else is wrong...
Did you try a PETSc example, such as SNES ex19, with this?

  Thanks,

    Matt


> [valera at node50 alone]$ mpirun -n 1 ./linsolve -vec_type cusp -mat_type
> aijcusparse
>  laplacian.petsc !
>  TrivSoln loaded, size:          125 /         125
>  RHS loaded, size:          125 /         125
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Null argument, when expecting valid pointer
> [0]PETSC ERROR: Null Pointer: Parameter # 4
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./linsolve on a cuda named node50 by valera Wed Mar 14
> 10:24:35 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #1 VecSetValues() line 851 in /home/valera/petsc/src/vec/
> vec/interface/rvector.c
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Object (seq) is not seqcusp or mpicusp
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT
> Date: 2018-02-28 10:19:08 -0600
> [0]PETSC ERROR: ./linsolve on a cuda named node50 by valera Wed Mar 14
> 10:24:35 2018
> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
> [0]PETSC ERROR: #2 VecCUSPGetArrayRead() line 1792 in
> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/veccusp2.cu
> [0]PETSC ERROR: #3 VecAXPY_SeqCUSP() line 314 in
> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/veccusp2.cu
> [0]PETSC ERROR: #4 VecAXPY() line 612 in /home/valera/petsc/src/vec/
> vec/interface/rvector.c
> [0]PETSC ERROR: #5 KSPSolve_GCR_cycle() line 60 in
> /home/valera/petsc/src/ksp/ksp/impls/gcr/gcr.c
> [0]PETSC ERROR: #6 KSPSolve_GCR() line 114 in /home/valera/petsc/src/ksp/
> ksp/impls/gcr/gcr.c
> [0]PETSC ERROR: #7 KSPSolve() line 669 in /home/valera/petsc/src/ksp/
> ksp/interface/itfunc.c
>  soln maxval:   0.0000000000000000
>  soln minval:   0.0000000000000000
>  Norm:   11.180339887498949
>  Its:           0
> WARNING! There are options you set that were not used!
> WARNING! could be spelling mistake, etc!
> Option left: name:-mat_type value: aijcusparse
> [valera at node50 alone]$
>
>
> I also see the configure options are not correct, so i guess is still
> linking a different petsc installation, but maybe we can try to make it
> work as it is, i will let you know if i am able to link the correct petsc
> installation here,
>
> Best,
>
>
>
>
> On Sun, Mar 11, 2018 at 9:00 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Fri, Mar 9, 2018 at 3:05 AM, Manuel Valera <mvalera-w at mail.sdsu.edu>
>> wrote:
>>
>>> Hello all,
>>>
>>> I am working on porting a linear solver into GPUs for timing purposes,
>>> so far i've been able to compile and run the CUSP libraries and compile
>>> PETSc to be used with CUSP and ViennaCL, after the initial runs i noticed
>>> some errors, they are different for different flags and i would appreciate
>>> any help interpreting them,
>>>
>>> The only elements in this program that use PETSc are the laplacian
>>> matrix (sparse), the RHS and X vectors and a scatter petsc object, so i
>>> would say it's safe to pass the command line arguments for the
>>> Mat/VecSetType()s instead of changing the source code,
>>>
>>> If i use *-vec_type cuda -mat_type aijcusparse* or *-vec_type viennacl
>>> -mat_type aijviennacl *i get the following:
>>>
>>
>> These systems do not properly propagate errors. My only advice is to run
>> a smaller problem and see.
>>
>>
>>> [0]PETSC ERROR: ------------------------------
>>> ------------------------------------------
>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>> probably memory access out of range
>>> [0]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/d
>>> ocumentation/faq.html#valgrind
>>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
>>> OS X to find memory corruption errors
>>> [0]PETSC ERROR: likely location of problem given in stack below
>>> [0]PETSC ERROR: ---------------------  Stack Frames
>>> ------------------------------------
>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>> available,
>>> [0]PETSC ERROR:       INSTEAD the line number of the start of the
>>> function
>>> [0]PETSC ERROR:       is given.
>>> [0]PETSC ERROR: [0] VecSetValues line 847 /home/valera/petsc/src/vec/vec
>>> /interface/rvector.c
>>> [0]PETSC ERROR: [0] VecSetType line 36 /home/valera/petsc/src/vec/vec
>>> /interface/vecreg.c
>>> [0]PETSC ERROR: [0] VecSetTypeFromOptions_Private line 1230
>>> /home/valera/petsc/src/vec/vec/interface/vector.c
>>> [0]PETSC ERROR: [0] VecSetFromOptions line 1271
>>> /home/valera/petsc/src/vec/vec/interface/vector.c
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Signal received
>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a
>>> GIT Date: 2018-02-28 10:19:08 -0600
>>> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar
>>> 8 09:50:51 2018
>>> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
>>> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
>>> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
>>> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
>>> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
>>> [0]PETSC ERROR: #5 User provided function() line 0 in  unknown file
>>> ------------------------------------------------------------
>>> --------------
>>>
>>> This seems to be a memory out of range, maybe my vector is too big for
>>> my CUDA system? how do i assess that?
>>>
>>>
>>> Next, if i use *-vec_type cusp -mat_type aijcusparse *i get something
>>> different and more interesting:
>>>
>>
>> We need to see the entire error message, since it has the stack.
>>
>> This seems like a logic error, but could definitely be on our end. Here
>> is how I think about these:
>>
>>   1) We have nightly test solves, so at least some solver configuration
>> works
>>
>>   2) Some vector which is marked read-only (happens for input to
>> solvers), but someone is trying to update it.
>>       The stack will tell me where this is happening.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Object is in wrong state
>>> [0]PETSC ERROR:  Vec is locked read only, argument # 3
>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a
>>> GIT Date: 2018-02-28 10:19:08 -0600
>>> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar
>>> 8 10:02:19 2018
>>> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
>>> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
>>> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
>>> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
>>> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
>>> [0]PETSC ERROR: #48 KSPSolve() line 615 in /home/valera/petsc/src/ksp/ksp
>>> /interface/itfunc.c
>>>  PETSC_SOLVER_ONLY   6.8672990892082453E-005 s
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Invalid argument
>>> [0]PETSC ERROR: Object (seq) is not seqcusp or mpicusp
>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.8.3-1817-g96b6f8a
>>> GIT Date: 2018-02-28 10:19:08 -0600
>>> [0]PETSC ERROR: ./gcmSeamount on a cuda named node50 by valera Thu Mar
>>> 8 10:02:19 2018
>>> [0]PETSC ERROR: Configure options PETSC_ARCH=cuda --with-cc=mpicc
>>> --with-cxx=mpic++ --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
>>> --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
>>> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
>>> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
>>> [0]PETSC ERROR: #49 VecCUSPGetArrayReadWrite() line 1718 in
>>> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/veccusp2.cu
>>> [0]PETSC ERROR: #50 VecScatterCUSP_StoS() line 269 in
>>> /home/valera/petsc/src/vec/vec/impls/seq/seqcusp/vecscattercusp.cu
>>>
>>>
>>>
>>>
>>>
>>> And it yields a "solution" to the system and also a log at the end:
>>>
>>>
>>>
>>>
>>>
>>> ./gcmSeamount on a cuda named node50 with 1 processor, by valera Thu
>>> Mar  8 10:02:24 2018
>>> Using Petsc Development GIT revision: v3.8.3-1817-g96b6f8a  GIT Date:
>>> 2018-02-28 10:19:08 -0600
>>>
>>>                          Max       Max/Min        Avg      Total
>>> Time (sec):           4.573e+00      1.00000   4.573e+00
>>> Objects:              8.100e+01      1.00000   8.100e+01
>>> Flop:                 3.492e+07      1.00000   3.492e+07  3.492e+07
>>> Flop/sec:            7.637e+06      1.00000   7.637e+06  7.637e+06
>>> Memory:               2.157e+08      1.00000              2.157e+08
>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00      0.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>>                             e.g., VecAXPY() for real vectors of length N
>>> --> 2N flop
>>>                             and VecAXPY() for complex vectors of length
>>> N --> 8N flop
>>>
>>> Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages
>>> ---  -- Message Lengths --  -- Reductions --
>>>                         Avg     %Total     Avg     %Total   counts
>>>  %Total     Avg         %Total   counts   %Total
>>>  0:      Main Stage: 4.5729e+00 100.0%  3.4924e+07 100.0%  0.000e+00
>>>  0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>>    Count: number of times phase was executed
>>>    Time and Flop: Max - maximum over all processors
>>>                    Ratio - ratio of maximum to minimum over all
>>> processors
>>>    Mess: number of messages sent
>>>    Avg. len: average message length (bytes)
>>>    Reduct: number of global reductions
>>>    Global: entire computation
>>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().
>>>       %T - percent time in this phase         %F - percent flop in this
>>> phase
>>>       %M - percent messages in this phase     %L - percent message
>>> lengths in this phase
>>>       %R - percent reductions in this phase
>>>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>>> over all processors)
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>>
>>>
>>>       ##########################################################
>>>       #                                                        #
>>>       #                          WARNING!!!                    #
>>>       #                                                        #
>>>       #   This code was compiled with a debugging option,      #
>>>       #   To get timing results run ./configure                #
>>>       #   using --with-debugging=no, the performance will      #
>>>       #   be generally two or three times faster.              #
>>>       #                                                        #
>>>       ##########################################################
>>>
>>>
>>> Event                Count      Time (sec)     Flop
>>>        --- Global ---  --- Stage ---   Total
>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> MatLUFactorNum         1 1.0 4.9502e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  1100  0  0  0   1100  0  0  0   706
>>> MatILUFactorSym        1 1.0 1.9642e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin       2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         2 1.0 2.6612e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
>>> MatGetRowIJ            1 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetOrdering         1 1.0 1.7186e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatLoad                1 1.0 1.1575e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>>> MatView                1 1.0 8.0877e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatCUSPCopyTo          1 1.0 2.4664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
>>> VecSet                68 1.0 5.1665e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> VecAssemblyBegin      17 1.0 5.2691e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd        17 1.0 4.3631e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin       15 1.0 1.5345e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecCUSPCopyFrom        1 1.0 1.1199e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSetUp               1 1.0 5.1929e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> PCSetUp                2 1.0 8.6590e-02 1.0 3.49e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  2100  0  0  0   2100  0  0  0   403
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type          Creations   Destructions     Memory  Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>>               Matrix     3              1     52856972     0.
>>>    Matrix Null Space     1              1          608     0.
>>>               Vector    66              3      3414600     0.
>>>       Vector Scatter     1              1          680     0.
>>>               Viewer     3              2         1680     0.
>>>        Krylov Solver     1              0            0     0.
>>>       Preconditioner     2              1          864     0.
>>>            Index Set     4              1          800     0.
>>> ============================================================
>>> ============================================================
>>> Average time to get PetscTime(): 9.53674e-08
>>> #PETSc Option Table entries:
>>> -ksp_view
>>> -log_view
>>> -mat_type aijcusparse
>>> -matload_block_size 1
>>> -vec_type cusp
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: PETSC_ARCH=cuda --with-cc=mpicc --with-cxx=mpic++
>>> --with-fc=mpifort --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
>>> --with-shared-libraries=1 --with-debugging=1 --with-cuda=1
>>> --with-cuda-arch=sm_60 --with-cusp=1 --with-cusp-dir=/home/valera/cusp
>>> --with-vienacl=1 --download-fblaslapack=1 --download-hypre
>>> -----------------------------------------
>>> Libraries compiled on Mon Mar  5 16:37:18 2018 on node50
>>> Machine characteristics: Linux-3.10.0-693.17.1.el7.x86_
>>> 64-x86_64-with-centos-7.2.1511-Core
>>> Using PETSc directory: /home/valera/petsc
>>> Using PETSc arch: cuda
>>> -----------------------------------------
>>>
>>> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings
>>> -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector
>>> -fvisibility=hidden -O3
>>> Using Fortran compiler: mpifort  -fPIC -Wall -ffree-line-length-0
>>> -Wno-unused-dummy-argument -O3
>>> -----------------------------------------
>>>
>>> Using include paths: -I/home/valera/petsc/cuda/include
>>> -I/home/valera/petsc/include -I/home/valera/petsc/include
>>> -I/home/valera/petsc/cuda/include -I/home/valera/cusp/
>>> -I/usr/local/cuda/include
>>> -----------------------------------------
>>>
>>> Using C linker: mpicc
>>> Using Fortran linker: mpifort
>>> Using libraries: -Wl,-rpath,/home/valera/petsc/cuda/lib
>>> -L/home/valera/petsc/cuda/lib -lpetsc -Wl,-rpath,/home/valera/petsc/cuda/lib
>>> -L/home/valera/petsc/cuda/lib -Wl,-rpath,/usr/local/cuda/lib64
>>> -L/usr/local/cuda/lib64 -Wl,-rpath,/usr/lib64/openmpi/lib
>>> -L/usr/lib64/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lHYPRE -lflapack -lfblas -lm
>>> -lcufft -lcublas -lcudart -lcusparse -lX11 -lstdc++ -ldl -lmpi_usempi
>>> -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath
>>> -lpthread -lstdc++ -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>> Thanks for your help,
>>>
>>> Manuel
>>>
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180315/63e9c241/attachment-0001.html>


More information about the petsc-users mailing list