[petsc-users] GPUs, cud, complex
Randall Mackie
rlmackie862 at gmail.com
Fri Feb 22 16:48:28 CST 2019
Below is more systematic information about my attempts to run complex on a GPU. Thus far I have been unsuccessful, and any pointers/suggestions are appreciated. Next week I will try another machine with a different GPU.
In the meantime, here is what i get:
We have a machine on which we have 1 GPU:
GPU 0: GeForce GTX 1050 Ti
We have installed the latest version of Cuda:
Cuda compilation tools, release 10.0, V10.0.130
I have compiled a gfortran version of petsc in complex:
./configure \
--with-clean=1 \
--with-scalar-type=complex \
--with-debugging=1 \
--with-fortran=1 \
--with-cuda=1 \
--with-cudac=/usr/local/cuda-10.0/bin/nvcc \
--download-mpich=./mpich-3.3b1.tar.gz \
--download-fblaslapack=fblaslapack-3.4.2.tar.gz \
To test I compiled ksp/ksp/examples/tests/ex32.c with a real version of petsc.
Run on the CPU with the following options I get the following output:
mpirun -np 1 ./ex32 \
-dm_mat_type sbaij \
-pc_type jacobi \
-ksp_monitor_short
# ./cmd_ex32
0 KSP Residual norm 0.076964
1 KSP Residual norm 0.0461195
2 KSP Residual norm 0.0281244
3 KSP Residual norm 0.0182566
4 KSP Residual norm 0.00902356
5 KSP Residual norm 0.00360423
6 KSP Residual norm 0.0010161
7 KSP Residual norm 0.000379953
8 KSP Residual norm 0.000168001
9 KSP Residual norm 0.000100787
10 KSP Residual norm 6.30212e-05
11 KSP Residual norm 3.74954e-05
12 KSP Residual norm 1.30554e-05
13 KSP Residual norm 3.7475e-06
14 KSP Residual norm 1.7272e-06
15 KSP Residual norm 6.5817e-07
When I run on the GPU, I get the following:
${PETSC_DIR}/lib/petsc/bin/petscmpiexec -n 1 ./ex32_real \
-dm_mat_type aijcusparse \
-dm_vec_type cuda \
-pc_type jacobi \
-ksp_monitor_short \
# ./cmd_ex32_cuda
0 KSP Residual norm 0.076964
1 KSP Residual norm 0.0456494
2 KSP Residual norm 0.0277184
3 KSP Residual norm 0.0156403
4 KSP Residual norm 0.0077786
5 KSP Residual norm 0.00211375
6 KSP Residual norm 0.000615725
7 KSP Residual norm 0.000169699
8 KSP Residual norm 6.17985e-05
9 KSP Residual norm 1.57243e-05
10 KSP Residual norm 8.98356e-07
11 KSP Residual norm 1.51097e-07
When run with the complex debug version I get this:
# ./cmd_ex32_cmplx
0 KSP Residual norm < 1.e-11
(obviously this is not correct, even if it does not bomb out).
Running this example under valgrind gives tons of unhelpful messages like this:
0 KSP Residual norm < 1.e-11
==3767== 4 bytes in 1 blocks are possibly lost in loss record 9 of 1,618
==3767== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==3767== by 0x1A6EE927: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A7A0BE4: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A7ACD4F: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A7AD94A: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A81F045: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A81F301: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A81FAD7: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A82033D: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A69E8EB: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A6A092E: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A5D530B: ??? (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x1A70C5AA: cuDevicePrimaryCtxRetain (in /usr/lib64/libcuda.so.410.48)
==3767== by 0x12376D2F: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767== by 0x12376E9B: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767== by 0x1237787E: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767== by 0x123782E7: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767== by 0x1236B43D: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767== by 0x1235ADE7: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767== by 0x1238C23B: cudaMalloc (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==3767==
When I try ex39.c in the same directory (ksp/ksp/examples/tests), under valgrind, I get this:
# ./cmd_ex39
==3684== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x7ff with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x37 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Warning: noted but unhandled ioctl 0x42 with no size/direction hints.
==3684== This could cause spurious value errors to appear.
==3684== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==3684== Invalid read of size 8
==3684== at 0x50F7C6C: VecCUDAGetArrayRead (veccuda2.cu:1284)
==3684== by 0x50EB8C3: VecAYPX_SeqCUDA (veccuda2.cu:188)
==3684== by 0x52F741C: VecAYPX (rvector.c:750)
==3684== by 0x6701CF5: KSPBuildResidualDefault (iterativ.c:891)
==3684== by 0x66EA04E: KSPBuildResidual (itfunc.c:2142)
==3684== by 0x66FB812: KSPMonitorTrueResidualNorm (iterativ.c:259)
==3684== by 0x66E7272: KSPMonitor (itfunc.c:1716)
==3684== by 0x6673AB4: KSPSolve_BCGS (bcgs.c:64)
==3684== by 0x66DCA48: KSPSolve (itfunc.c:780)
==3684== by 0x402601: main (ex39.c:111)
==3684== Address 0xa0 is not stack'd, malloc'd or (recently) free'd
==3684==
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: [0] VecCUDAGetArrayRead line 1283 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
[0]PETSC ERROR: [0] VecAYPX_SeqCUDA line 185 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
[0]PETSC ERROR: [0] VecAYPX line 739 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: [0] KSPBuildResidualDefault line 886 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
[0]PETSC ERROR: [0] KSPBuildResidual line 2132 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: [0] KSPMonitorTrueResidualNorm line 252 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
[0]PETSC ERROR: [0] KSPMonitor line 1714 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: [0] KSPSolve_BCGS line 33 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/impls/bcgs/bcgs.c
[0]PETSC ERROR: [0] KSPSolve line 678 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
[0]PETSC ERROR: ./ex39_cmplx on a linux-gfortran-complex-debug named GPU by root Fri Feb 22 14:43:35 2019
[0]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --with-cuda=1 --with-cudac=/usr/local/cuda-10.0/bin/nvcc --download-mpich=./mpich-3.3b1.tar.gz --download-fblaslapack=fblaslapack-3.4.2.tar.gz
[0]PETSC ERROR: #1 User provided function() line 0 in unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
I also tried another example, snes/examples/tutorials/ex19.c, which gives similar messages:
# ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -pc_type none -ksp_type fgmres -snes_monitor_short -snes_rtol 1.e-5
lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
0 SNES Function norm 0.239155
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: [0] VecCUDAGetArrayRead line 1283 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
[0]PETSC ERROR: [0] VecAXPY_SeqCUDA line 215 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
[0]PETSC ERROR: [0] VecAXPY line 597 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: [0] MatFDColoringApply_AIJ line 176 /home/everderio/DEV/petsc-3.10.3/src/mat/impls/aij/mpi/fdmpiaij.c
[0]PETSC ERROR: [0] MatFDColoringApply line 614 /home/everderio/DEV/petsc-3.10.3/src/mat/matfd/fdmatrix.c
[0]PETSC ERROR: [0] SNESComputeJacobian_DMDA line 153 /home/everderio/DEV/petsc-3.10.3/src/snes/utils/dmdasnes.c
[0]PETSC ERROR: [0] SNES user Jacobian function line 2555 /home/everderio/DEV/petsc-3.10.3/src/snes/interface/snes.c
[0]PETSC ERROR: [0] SNESComputeJacobian line 2514 /home/everderio/DEV/petsc-3.10.3/src/snes/interface/snes.c
[0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 /home/everderio/DEV/petsc-3.10.3/src/snes/impls/ls/ls.c
[0]PETSC ERROR: [0] SNESSolve line 4282 /home/everderio/DEV/petsc-3.10.3/src/snes/interface/snes.c
[0]PETSC ERROR: [0] main line 108 /home/everderio/DEV/petsc-3.10.3/src/snes/examples/tutorials/ex19.c
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
[0]PETSC ERROR: ./ex19 on a linux-gfortran-complex-debug named GPU by root Fri Feb 22 17:24:34 2019
[0]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --with-cuda=1 --CUDAFLAGS=-arch=sm_61 --with-cudac=/usr/local/cuda-10.0/bin/nvcc --download-mpich=./mpich-3.3b1.tar.gz --download-fblaslapack=fblaslapack-3.4.2.tar.gz
[0]PETSC ERROR: #1 User provided function() line 0 in unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59
:
system msg for write_line failure : Bad file descriptor
Thanks for any help or suggestions about getting complex to work with GPUS.
Randy M.
> On Feb 21, 2019, at 9:53 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
>
> Hmm, ex32 suddenly becomes ex39 (and there is no ex39 in the src/ksp/ksp/examples/tutorials/ directory?) I try ex32 with those options and it runs though the -n1 n2 n3 options aren't used.
>
> Barry
>
>
>> On Feb 21, 2019, at 6:20 PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
>>
>> Hi Barry and Satish,
>>
>> Yes, sorry, I meant -dm_mat_type_aijcusparse…..
>>
>> Here is an attempt to run ex39 under complex:
>>
>> 0]PETSC ERROR: ------------------------------------------------------------------------
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [0]PETSC ERROR: likely location of problem given in stack below
>> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------
>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [0]PETSC ERROR: INSTEAD the line number of the start of the function
>> [0]PETSC ERROR: is given.
>> [0]PETSC ERROR: [0] VecCUDAGetArrayRead line 1283 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
>> [0]PETSC ERROR: [0] VecAYPX_SeqCUDA line 185 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
>> [0]PETSC ERROR: [0] VecAYPX line 739 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/interface/rvector.c
>> [0]PETSC ERROR: [0] KSPBuildResidualDefault line 886 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
>> [0]PETSC ERROR: [0] KSPBuildResidual line 2132 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>> [0]PETSC ERROR: [0] KSPMonitorTrueResidualNorm line 252 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
>> [0]PETSC ERROR: [0] KSPMonitor line 1714 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>> [0]PETSC ERROR: [0] KSPSolve_BCGS line 33 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/impls/bcgs/bcgs.c
>> [0]PETSC ERROR: [0] KSPSolve line 678 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [0]PETSC ERROR: Signal received
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
>> [0]PETSC ERROR: ./ex39_cmplx on a linux-gfortran-complex-debug named GPU by root Thu Feb 21 19:03:37 2019
>> [0]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --with-cuda=1 --with-cudac=/usr/local/cuda-10.0/bin/nvcc --download-mpich=./mpich-3.3b1.tar.gz --download-fblaslapack=fblaslapack-3.4.2.tar.gz
>> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>
>>
>> I used these options:
>>
>> #!/bin/bash
>>
>> export PETSC_ARCH=linux-gfortran-complex-debug
>>
>> ${PETSC_DIR}/lib/petsc/bin/petscmpiexec -n 1 ./ex39_cmplx \
>> -ksp_type bcgs \
>> -ksp_rtol 1.e-6 \
>> -pc_type jacobi \
>> -ksp_monitor_true_residual \
>> -ksp_converged_reason \
>> -mat_type aijcusparse \
>> -vec_type cuda \
>> -n1 32 \
>> -n2 32 \
>> -n3 32 \
>>
>>
>> My next step was going to try valgrind and see if that turned something up.
>>
>> Thanks, Randy
>>
>>
>>> On Feb 21, 2019, at 2:51 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>>
>>>
>>> Randy,
>>>
>>> Could you please cut and paste the entire error message you get. It worked for me.
>>>
>>> I assume you mean -dm_mat_type aijcusparse not aijcuda (which doesn't exist).
>>>
>>> Satish,
>>>
>>> I does appear we do not have a nightly test for cuda and complex, could that test be added to the nightly sweeps?
>>>
>>> Thanks
>>>
>>> Barry
>>>
>>>
>>>> On Feb 14, 2019, at 7:33 PM, Randall Mackie via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>
>>>> We are testing whether or not we can benefit from porting our PETSc code to use GPUS.
>>>> We have installed PETSc following the instructions here:
>>>> https://www.mcs.anl.gov/petsc/features/gpus.html
>>>>
>>>> Using KSP example 32 (ex32.c) to test, and using -dm_vec_type cuda and -dm_mat_type aijcuda
>>>> then ex32 runs fine when compiled with a REAL version of PETSc but bombs out when using a COMPLEX version.
>>>>
>>>> Is it possible to run PETSc on GPUS in complex mode?d
>>>>
>>>>
>>>> Thanks, Randy M.
>>>
>>
>
More information about the petsc-users
mailing list