[petsc-users] GPUs, cud, complex

Karl Rupp rupp at iue.tuwien.ac.at
Sun Feb 24 12:28:45 CST 2019


Hi,

just for information: we've seen some issues with Thrust in recent CUDA 
versions (mostly compilation issues). I don't know whether this is the 
cause of this particular error, though.

Best regards,
Karli



On 2/23/19 6:00 AM, Smith, Barry F. via petsc-users wrote:
> 
>     I get this in the debugger (can't make heads or tails out of it, does thrust handle complex in some way we don't understand so doesn't work with our source code?)
> 
> Thread 1 "ex32" received signal SIGSEGV, Segmentation fault.
> 0x00007f1a3913abd0 in thrust::complex<double>::real (this=0x0) at /usr/local/cuda/include/thrust/complex.h:324
> 324	  T real() const { return data.x; }
> (gdb) b
> Breakpoint 1 at 0x7f1a3913abd0: file /usr/local/cuda/include/thrust/complex.h, line 324.
> (gdb) bt
> #0  0x00007f1a3913abd0 in thrust::complex<double>::real (this=0x0)
>      at /usr/local/cuda/include/thrust/complex.h:324
> #1  0x00007f1a3913ae48 in thrust::operator==<double, double> (x=..., y=...)
>      at /usr/local/cuda/include/thrust/detail/complex/complex.inl:313
> #2  0x00007f1a3912f8ff in VecSet_SeqCUDA (xin=0x215c5e0,
>      alpha=<error reading variable: Cannot access memory at address 0x0>)
>      at /sandbox/petsc/petsc.next/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:775
> #3  0x00007f1a39336ee7 in VecSet (x=0x215c5e0, alpha=0 + 0 * I)
>      at /sandbox/petsc/petsc.next/src/vec/vec/interface/rvector.c:547
> #4  0x00007f1a392e7110 in VecCreate_SeqCUDA (V=0x215c5e0)
>      at /sandbox/petsc/petsc.next/src/vec/vec/impls/seq/seqcuda/veccuda.c:310
> #5  0x00007f1a3932bb23 in VecSetType (vec=0x215c5e0, method=0x7f1a3ab28b0b "seqcuda")
>      at /sandbox/petsc/petsc.next/src/vec/vec/interface/vecreg.c:51
> #6  0x00007f1a39146894 in VecCreate_CUDA (v=0x215c5e0)
>      at /sandbox/petsc/petsc.next/src/vec/vec/impls/mpi/mpicuda/mpicuda.cu:190
> #7  0x00007f1a3932bb23 in VecSetType (vec=0x215c5e0, method=0x21556b0 "cuda")
>      at /sandbox/petsc/petsc.next/src/vec/vec/interface/vecreg.c:51
> #8  0x00007f1a39f12dfe in DMCreateGlobalVector_DA (da=0x214ad10, g=0x7ffc8b95a5b8)
>      at /sandbox/petsc/petsc.next/src/dm/impls/da/dadist.c:39
> #9  0x00007f1a3a2deeca in DMCreateGlobalVector (dm=0x214ad10, vec=0x7ffc8b95a5b8)
>      at /sandbox/petsc/petsc.next/src/dm/interface/dm.c:932
> #10 0x0000000000401ebc in main (argc=7, argv=0x7ffc8b95a6d8)
> ---Type <return> to continue, or q <return> to quit---
>      at /sandbox/petsc/petsc.next/src/ksp/ksp/examples/tests/ex32.c:48
> (gdb)
> (gdb) up
> #1  0x00007f1a3913ae48 in thrust::operator==<double, double> (x=..., y=...)
>      at /usr/local/cuda/include/thrust/detail/complex/complex.inl:313
> 313	  return x.real() == y.real() && x.imag() == y.imag();
> (gdb) p x
> $1 = (const thrust::complex<double> &) @0x0: <error reading variable>
> (gdb) p y
> $2 = (const thrust::complex<double> &) @0x7ffc8b95a370: {data = {x = 0, y = 0}}
> (gdb) up
> #2  0x00007f1a3912f8ff in VecSet_SeqCUDA (xin=0x215c5e0,
>      alpha=<error reading variable: Cannot access memory at address 0x0>)
>      at /sandbox/petsc/petsc.next/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:775
> 775	  if (alpha == (PetscScalar)0.0) {
> (gdb) p alpha
> Cannot access memory at address 0x0
> (gdb) up
> #3  0x00007f1a39336ee7 in VecSet (x=0x215c5e0, alpha=0 + 0 * I)
>      at /sandbox/petsc/petsc.next/src/vec/vec/interface/rvector.c:547
> 547	  ierr = (*x->ops->set)(x,alpha);CHKERRQ(ierr);
> (gdb) p alpha
> 
> 
> 
>> On Feb 22, 2019, at 10:53 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>
>>
>>   I am getting some crashes on our machine frog with complex cuda on those examples, though maybe not at the exact same line numbers.
>>
>>   We need more help from our CUDA experts
>>
>>    Barry
>>
>>
>>> On Feb 22, 2019, at 8:33 AM, Randall Mackie <rlmackie862 at gmail.com> wrote:
>>>
>>> Sorry, I’ve tried both ex32 and ex39 in the src/ksp/ksp/examples/tests directory, both would give the same error.
>>> I’ll try with valgrind or try another computer with a different GPU.
>>>
>>> Thanks for confirming that complex *should* work on GPUs.
>>>
>>> Randy M.
>>>
>>>
>>>
>>>> On Feb 21, 2019, at 9:53 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>
>>>> Hmm, ex32 suddenly becomes ex39 (and there is no ex39 in the src/ksp/ksp/examples/tutorials/ directory?) I try ex32 with those options and it runs though the -n1 n2 n3 options aren't used.
>>>>
>>>> Barry
>>>>
>>>>
>>>>> On Feb 21, 2019, at 6:20 PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
>>>>>
>>>>> Hi Barry and Satish,
>>>>>
>>>>> Yes, sorry, I meant -dm_mat_type_aijcusparse…..
>>>>>
>>>>> Here is an attempt to run ex39 under complex:
>>>>>
>>>>> 0]PETSC ERROR: ------------------------------------------------------------------------
>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [0]PETSC ERROR: likely location of problem given in stack below
>>>>> [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>>>> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
>>>>> [0]PETSC ERROR:       is given.
>>>>> [0]PETSC ERROR: [0] VecCUDAGetArrayRead line 1283 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
>>>>> [0]PETSC ERROR: [0] VecAYPX_SeqCUDA line 185 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/impls/seq/seqcuda/veccuda2.cu
>>>>> [0]PETSC ERROR: [0] VecAYPX line 739 /home/everderio/DEV/petsc-3.10.3/src/vec/vec/interface/rvector.c
>>>>> [0]PETSC ERROR: [0] KSPBuildResidualDefault line 886 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
>>>>> [0]PETSC ERROR: [0] KSPBuildResidual line 2132 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>>>>> [0]PETSC ERROR: [0] KSPMonitorTrueResidualNorm line 252 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/iterativ.c
>>>>> [0]PETSC ERROR: [0] KSPMonitor line 1714 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>>>>> [0]PETSC ERROR: [0] KSPSolve_BCGS line 33 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/impls/bcgs/bcgs.c
>>>>> [0]PETSC ERROR: [0] KSPSolve line 678 /home/everderio/DEV/petsc-3.10.3/src/ksp/ksp/interface/itfunc.c
>>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>>> [0]PETSC ERROR: Signal received
>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>>> [0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
>>>>> [0]PETSC ERROR: ./ex39_cmplx on a linux-gfortran-complex-debug named GPU by root Thu Feb 21 19:03:37 2019
>>>>> [0]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=1 --with-fortran=1 --with-cuda=1 --with-cudac=/usr/local/cuda-10.0/bin/nvcc --download-mpich=./mpich-3.3b1.tar.gz --download-fblaslapack=fblaslapack-3.4.2.tar.gz
>>>>> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>>>>>
>>>>>
>>>>> I used these options:
>>>>>
>>>>> #!/bin/bash
>>>>>
>>>>> export PETSC_ARCH=linux-gfortran-complex-debug
>>>>>
>>>>> ${PETSC_DIR}/lib/petsc/bin/petscmpiexec -n 1 ./ex39_cmplx \
>>>>> -ksp_type bcgs \
>>>>> -ksp_rtol 1.e-6 \
>>>>> -pc_type jacobi \
>>>>> -ksp_monitor_true_residual \
>>>>> -ksp_converged_reason \
>>>>> -mat_type aijcusparse \
>>>>> -vec_type cuda \
>>>>> -n1 32 \
>>>>> -n2 32 \
>>>>> -n3 32 \
>>>>>
>>>>>
>>>>> My next step was going to try valgrind and see if that turned something up.
>>>>>
>>>>> Thanks, Randy
>>>>>
>>>>>
>>>>>> On Feb 21, 2019, at 2:51 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>>>>>
>>>>>>
>>>>>> Randy,
>>>>>>
>>>>>> Could you please cut and paste the entire error message you get. It worked for me.
>>>>>>
>>>>>> I assume you mean -dm_mat_type aijcusparse  not aijcuda (which doesn't exist).
>>>>>>
>>>>>> Satish,
>>>>>>
>>>>>>   I does appear we do not have a nightly test for cuda and complex, could that test be added to the nightly sweeps?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>
>>>>>>> On Feb 14, 2019, at 7:33 PM, Randall Mackie via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>> We are testing whether or not we can benefit from porting our PETSc code to use GPUS.
>>>>>>> We have installed PETSc following the instructions here:
>>>>>>> https://www.mcs.anl.gov/petsc/features/gpus.html
>>>>>>>
>>>>>>> Using KSP example 32 (ex32.c) to test, and using -dm_vec_type cuda and -dm_mat_type aijcuda
>>>>>>> then ex32 runs fine when compiled with a REAL version of PETSc but bombs out when using a COMPLEX version.
>>>>>>>
>>>>>>> Is it possible to run PETSc on GPUS in complex mode?d
>>>>>>>
>>>>>>>
>>>>>>> Thanks, Randy M.
>>>>>>
>>>>>
>>>>
>>>
>>
> 


More information about the petsc-users mailing list