[petsc-users] GPUs, cud, complex

Smith, Barry F. bsmith at mcs.anl.gov
Fri Feb 22 23:00:08 CST 2019


   I get this in the debugger (can't make heads or tails out of it, does thrust handle complex in some way we don't understand so doesn't work with our source code?) 

Thread 1 "ex32" received signal SIGSEGV, Segmentation fault.
0x00007f1a3913abd0 in thrust::complex<double>::real (this=0x0) at /usr/local/cuda/include/thrust/complex.h:324
324	  T real() const { return data.x; }
(gdb) b
Breakpoint 1 at 0x7f1a3913abd0: file /usr/local/cuda/include/thrust/complex.h, line 324.
(gdb) bt
#0  0x00007f1a3913abd0 in thrust::complex<double>::real (this=0x0)
    at /usr/local/cuda/include/thrust/complex.h:324
#1  0x00007f1a3913ae48 in thrust::operator==<double, double> (x=..., y=...)
    at /usr/local/cuda/include/thrust/detail/complex/complex.inl:313
#2  0x00007f1a3912f8ff in VecSet_SeqCUDA (xin=0x215c5e0, 
    alpha=<error reading variable: Cannot access memory at address 0x0>)
    at /sandbox/petsc/petsc.next/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:775
#3  0x00007f1a39336ee7 in VecSet (x=0x215c5e0, alpha=0 + 0 * I)
    at /sandbox/petsc/petsc.next/src/vec/vec/interface/rvector.c:547
#4  0x00007f1a392e7110 in VecCreate_SeqCUDA (V=0x215c5e0)
    at /sandbox/petsc/petsc.next/src/vec/vec/impls/seq/seqcuda/veccuda.c:310
#5  0x00007f1a3932bb23 in VecSetType (vec=0x215c5e0, method=0x7f1a3ab28b0b "seqcuda")
    at /sandbox/petsc/petsc.next/src/vec/vec/interface/vecreg.c:51
#6  0x00007f1a39146894 in VecCreate_CUDA (v=0x215c5e0)
    at /sandbox/petsc/petsc.next/src/vec/vec/impls/mpi/mpicuda/mpicuda.cu:190
#7  0x00007f1a3932bb23 in VecSetType (vec=0x215c5e0, method=0x21556b0 "cuda")
    at /sandbox/petsc/petsc.next/src/vec/vec/interface/vecreg.c:51
#8  0x00007f1a39f12dfe in DMCreateGlobalVector_DA (da=0x214ad10, g=0x7ffc8b95a5b8)
    at /sandbox/petsc/petsc.next/src/dm/impls/da/dadist.c:39
#9  0x00007f1a3a2deeca in DMCreateGlobalVector (dm=0x214ad10, vec=0x7ffc8b95a5b8)
    at /sandbox/petsc/petsc.next/src/dm/interface/dm.c:932
#10 0x0000000000401ebc in main (argc=7, argv=0x7ffc8b95a6d8)
---Type <return> to continue, or q <return> to quit---
    at /sandbox/petsc/petsc.next/src/ksp/ksp/examples/tests/ex32.c:48
(gdb) 
(gdb) up
#1  0x00007f1a3913ae48 in thrust::operator==<double, double> (x=..., y=...)
    at /usr/local/cuda/include/thrust/detail/complex/complex.inl:313
313	  return x.real() == y.real() && x.imag() == y.imag();
(gdb) p x
$1 = (const thrust::complex<double> &) @0x0: <error reading variable>
(gdb) p y
$2 = (const thrust::complex<double> &) @0x7ffc8b95a370: {data = {x = 0, y = 0}}
(gdb) up
#2  0x00007f1a3912f8ff in VecSet_SeqCUDA (xin=0x215c5e0, 
    alpha=<error reading variable: Cannot access memory at address 0x0>)
    at /sandbox/petsc/petsc.next/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:775
775	  if (alpha == (PetscScalar)0.0) {
(gdb) p alpha
Cannot access memory at address 0x0
(gdb) up
#3  0x00007f1a39336ee7 in VecSet (x=0x215c5e0, alpha=0 + 0 * I)
    at /sandbox/petsc/petsc.next/src/vec/vec/interface/rvector.c:547
547	  ierr = (*x->ops->set)(x,alpha);CHKERRQ(ierr);
(gdb) p alpha



> On Feb 22, 2019, at 10:53 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
> 
>  I am getting some crashes on our machine frog with complex cuda on those examples, though maybe not at the exact same line numbers.
> 
>  We need more help from our CUDA experts
> 
>   Barry
> 
> 
>> On Feb 22, 2019, at 8:33 AM, Randall Mackie <rlmackie862 at gmail.com> wrote:
>> 
>> Sorry, I’ve tried both ex32 and ex39 in the src/ksp/ksp/examples/tests directory, both would give the same error.
>> I’ll try with valgrind or try another computer with a different GPU.
>> 
>> Thanks for confirming that complex *should* work on GPUs.
>> 
>> Randy M.
>> 
>> 
>> 
>>> On Feb 21, 2019, at 9:53 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>> 
>>> 
>>> Hmm, ex32 suddenly becomes ex39 (and there is no ex39 in the src/ksp/ksp/examples/tutorials/ directory?) I try ex32 with those options and it runs though the -n1 n2 n3 options aren't used.
>>> 
>>> Barry
>>> 
>>> 
>>>> On Feb 21, 2019, at 6:20 PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
>>>> 
>>>> Hi Barry and Satish,
>>>> 
>>>> Yes, sorry, I meant -dm_mat_type_aijcusparse…..
>>>> 
>>>> Here is an attempt to run ex39 under complex:
>>>> 
>>>> 0]PETSC ERROR: ------------------------------------------------------------------------
>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>> [0]PETSC ERROR: likely location of problem given in stack below
>>>> [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>>> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
>>>> [0]PETSC ERROR:       is given.
>>>> [0]PETSC ERROR: [0] VecCUDAGetArrayRead line 1283 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
>>>> [0]PETSC ERROR: [0] VecAYPX_SeqCUDA line 185 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
>>>> [0]PETSC ERROR: [0] VecAYPX line 739 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/interface/rvector.c
>>>> [0]PETSC ERROR: [0] KSPBuildResidualDefault line 886 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
>>>> [0]PETSC ERROR: [0] KSPBuildResidual line 2132 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>>>> [0]PETSC ERROR: [0] KSPMonitorTrueResidualNorm line 252 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
>>>> [0]PETSC ERROR: [0] KSPMonitor line 1714 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>>>> [0]PETSC ERROR: [0] KSPSolve_BCGS line 33 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/impls/bcgs/bcgs.c
>>>> [0]PETSC ERROR: [0] KSPSolve line 678 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>> [0]PETSC ERROR: Signal received
>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>> [0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018 
>>>> [0]PETSC ERROR: ./ex39_cmplx on a linux-gfortran-complex-debug named GPU by root Thu Feb 21 19:03:37 2019
>>>> [0]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --with-cuda=1 --with-cudac=/usr/local/cuda-10.0/bin/nvcc --download-mpich=./mpich-3.3b1.tar.gz --download-fblaslapack=fblaslapack-3.4.2.tar.gz
>>>> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>> 
>>>> 
>>>> I used these options:
>>>> 
>>>> #!/bin/bash
>>>> 
>>>> export PETSC_ARCH=linux-gfortran-complex-debug
>>>> 
>>>> ${PETSC_DIR}/lib/petsc/bin/petscmpiexec -n 1 ./ex39_cmplx \
>>>> -ksp_type bcgs \
>>>> -ksp_rtol 1.e-6 \
>>>> -pc_type jacobi \
>>>> -ksp_monitor_true_residual \
>>>> -ksp_converged_reason \
>>>> -mat_type aijcusparse \
>>>> -vec_type cuda \
>>>> -n1 32 \
>>>> -n2 32 \
>>>> -n3 32 \
>>>> 
>>>> 
>>>> My next step was going to try valgrind and see if that turned something up.
>>>> 
>>>> Thanks, Randy
>>>> 
>>>> 
>>>>> On Feb 21, 2019, at 2:51 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>>>> 
>>>>> 
>>>>> Randy,
>>>>> 
>>>>> Could you please cut and paste the entire error message you get. It worked for me. 
>>>>> 
>>>>> I assume you mean -dm_mat_type aijcusparse  not aijcuda (which doesn't exist).
>>>>> 
>>>>> Satish,
>>>>> 
>>>>>  I does appear we do not have a nightly test for cuda and complex, could that test be added to the nightly sweeps?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Barry
>>>>> 
>>>>> 
>>>>>> On Feb 14, 2019, at 7:33 PM, Randall Mackie via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>> 
>>>>>> We are testing whether or not we can benefit from porting our PETSc code to use GPUS.
>>>>>> We have installed PETSc following the instructions here:
>>>>>> https://www.mcs.anl.gov/petsc/features/gpus.html
>>>>>> 
>>>>>> Using KSP example 32 (ex32.c) to test, and using -dm_vec_type cuda and -dm_mat_type aijcuda 
>>>>>> then ex32 runs fine when compiled with a REAL version of PETSc but bombs out when using a COMPLEX version.
>>>>>> 
>>>>>> Is it possible to run PETSc on GPUS in complex mode?d
>>>>>> 
>>>>>> 
>>>>>> Thanks, Randy M.
>>>>> 
>>>> 
>>> 
>> 
> 



More information about the petsc-users mailing list