[petsc-dev] [GPU] src/ksp/ksp/examples/tutorials/ex9 fails with mpirun -np 2

Projet_TRIOU triou at cea.fr
Tue Jul 29 09:58:34 CDT 2014


Hello Karl,

For your interest, I understood why my code crashes on GPU with np CPUs > 1.

I am solving Ax=B with KSPSetInitialGuessNonzero set to true so I need
also to fill x with VecSetValues before each solve.

VecSetValues works on GPU with np=1 but not for np>1. I noticed that 
filling
x with VecSet worked fine so when I create x, I now initialize also with 
VecSet.
(I suppose it copy/create something on GPU that VecSetValues doesn't):

Vec x;
VecCreate(PETSC_COMM_WORLD,&x);
VecSetSizes(x, nb_rows_, PETSC_DECIDE);
VecSetType(x, VECMPICUSP);
VecSet(x,0.0); // Needed on GPU during parallel calculation, else it crashes

// Later:
for (int i=0; i<size; i++)
{
         VecSetValues(b, 1, &colonne_globale, &secmem(i), INSERT_VALUES);
         VecSetValues(x, 1, &colonne_globale, &solution(i), INSERT_VALUES);
         colonne_globale++;
}

Thanks for your help,

Pierre
> Thanks Karl, and sorry for my mistake. Indeed, ex9 runs in // with 
> -pc_type none or -pc_type jacobi.
> Good to know I have an example now with VecSetValues I can try to 
> replicate in my code.
> I will update the thread as soon as I return to some more tests on it.
>
> Pierre
>> Hi Pierre,
>>
>> I could reproduce the problem, but I could also verify that the 
>> problem you ran into was due to a currently unsupported block-Jacobi 
>> preconditioner. That is, if you first comment the KSPSolve for the 
>> ksp2 object in ex9, recompile ex9, and then run
>>
>> $> mpiexec -np 2 ./ex9 -ksp_monitor -mat_type aijcusp -vec_type cusp 
>> -pc_type none
>>
>> thinks should work out. This may also be applicable to your case: 
>> Give it a try with -pc_type none and let us whether the problem with 
>> VecSetValues remains. (The use of no preconditioner is certainly not 
>> great, but at least it may give you a first result)
>>
>> Best regards,
>> Karli
>>
>>
>>
>> On 03/26/2014 06:38 PM, Projet_TRIOU wrote:
>>> Hello all,
>>>
>>> My parallel code is really near to run using GPU device
>>> thanks to PETSc but I am struggling with the best way
>>> to fill the PETSc vectors in order in runs on CPU & GPU.
>>>
>>> I was using in the past, VecCreateMPIWithArray, to
>>> create vector but I had trouble with it on GPU.
>>>
>>> So I followed the example in the src/ksp/ksp/examples/tutorials/ex2.c
>>> and create now the vectors like this:
>>>
>>>    // Build x
>>>    ierr = VecCreate(PETSC_COMM_WORLD,&SecondMembrePetsc_); check(ierr);
>>>    ierr = VecSetSizes(SecondMembrePetsc_, nb_rows, nb_rows_tot);
>>> check(ierr);
>>>    ierr = VecSetFromOptions(SecondMembrePetsc_); check(ierr);
>>>    // Build b
>>>    ierr = VecDuplicate(SecondMembrePetsc_,&SolutionPetsc_);check(ierr);
>>>
>>> And fills it with VecSetValues function. It runs well on CPU and
>>> GPU but crashed only in parallel on GPU. It I use VecSet instead of
>>> VecSetValues, it didn't crash (but of course VecSet is not enough
>>> for me :-)
>>>
>>> I tried to find an example to reproduce for you the problem, and I
>>> think src/ksp/ksp/examples/tutorials/ex9 (it is usingVecSetValues,)
>>> is a good one.
>>>
>>> Or did I miss something (I also try VecPlaceArray/VecRestoreArray
>>> but without success on GPU) ?
>>>
>>> Thanks, and yes you are right,  "WARNING: Using GPUs effectively is
>>> difficult!" :-)
>>>
>>> Pierre
>>>
>>> sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
>>>
>>>  > ./ex9 -ksp_monitor
>>>    0 KSP Residual norm 6.612932697792e+00
>>>    1 KSP Residual norm 4.261830032389e-01
>>>    2 KSP Residual norm 2.121746090851e-02
>>>    3 KSP Residual norm 1.233779841608e-03
>>>    4 KSP Residual norm 1.265903168531e-05
>>>    0 KSP Residual norm 1.309416176382e-05
>>>    0 KSP Residual norm 1.404919664063e-05
>>>
>>> sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
>>>
>>>  > mpiexec -np 2 ./ex9 -ksp_monitor
>>>    0 KSP Residual norm 2.496821857304e+02
>>>    1 KSP Residual norm 4.522206074831e+01
>>>    2 KSP Residual norm 1.959482408314e+01
>>>    3 KSP Residual norm 7.002013703407e+00
>>>    4 KSP Residual norm 2.144105201713e+00
>>>    5 KSP Residual norm 1.780095080270e-01
>>>    6 KSP Residual norm 5.642702243268e-02
>>>    7 KSP Residual norm 6.439343992306e-03
>>>    8 KSP Residual norm 3.012756374415e-04
>>> Norm of error 0.000249108, Iterations 8
>>> Norm of error 0.000715584, Iterations 6
>>>    0 KSP Residual norm 3.422287562824e-04
>>> Norm of error 0.000249108, Iterations 0
>>> Norm of error 0.000192805, Iterations 7
>>>    0 KSP Residual norm 4.140588954098e-04
>>> Norm of error 0.000249108, Iterations 0
>>> Norm of error 0.000109507, Iterations 7
>>>
>>> sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
>>>
>>>  > ./ex9 -ksp_monitor -mat_type aijcusp -vec_type cusp
>>>    0 KSP Residual norm 6.612932697792e+00
>>>    1 KSP Residual norm 4.261830032389e-01
>>>    2 KSP Residual norm 2.121746090851e-02
>>>    3 KSP Residual norm 1.233779841608e-03
>>>    4 KSP Residual norm 1.265903168531e-05
>>>    0 KSP Residual norm 1.309416176403e-05
>>>    0 KSP Residual norm 1.404919664088e-05
>>>
>>> sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
>>>
>>>  > mpiexec -np 2 ./ex9 -ksp_monitor -mat_type aijcusp -vec_type cusp
>>> [0]PETSC ERROR:
>>> ------------------------------------------------------------------------ 
>>>
>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>> probably memory access out of range
>>> [0]PETSC ERROR: Try option -start_in_debugger or 
>>> -on_error_attach_debugger
>>> [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC
>>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>>> find memory corruption errors
>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>>> and run
>>> [0]PETSC ERROR: to get more information on the crash.
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Signal received
>>> [0]PETSC ERROR: See
>>> http://http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.4.4-3713-g576f62e
>>> GIT Date: 2014-03-23 15:59:15 -0500
>>> [0]PETSC ERROR: ./ex9 on a linux_opt named sitre.intra.cea.fr by triou
>>> Wed Mar 26 18:32:34 2014
>>> [0]PETSC ERROR: Configure options
>>> --prefix=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt 
>>>
>>> --with-single-library --with-shared-libraries=0 --with-debugging=0
>>> --with-errorchecking=1 --COPTFLAGS="   -O3 -fPIC " --CXXOPTFLAGS=" -O3
>>> -fPIC " --FOPTFLAGS="  -O3 -fPIC " --with-fortran=yes --with-clean=1
>>> --download-scalapack=../scalapack-2.0.2.tgz
>>> --download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-superlu_dist=yes
>>> --download-parmetis=../parmetis-4.0.2-p4.tar.gz
>>> --download-metis=../metis-5.0.2-p3.tar.gz
>>> --download-ptscotch=../ptscotch.tar.gz
>>> --download-hypre=../hypre-2.9.1a.tar.gz
>>> --with-valgrind-include=/work/triou/git/petsc-dev/Trio_U/exec/valgrind/include 
>>>
>>> --with-blas-lapack-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBLAPACK 
>>> --with-cuda=1
>>> --with-cuda-dir=/work/triou/git/petsc-dev/Trio_U/exec/cuda5.5
>>> --with-cusp=1
>>> --with-cusp-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/cusplibrary-0.4.0 
>>>
>>> --with-thrust=1 --with-cuda-arch=sm_21 --with-ssl=0
>>> --with-mpi-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBMPI/mpich
>>> --with-x=1
>>> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>> [1]PETSC ERROR:
>>> ------------------------------------------------------------------------ 
>>>
>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>> probably memory access out of range
>>> [1]PETSC ERROR: Try option -start_in_debugger or 
>>> -on_error_attach_debugger
>>> [1]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
>>> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
>>> find memory corruption errors
>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>>> and run
>>> [1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - 
>>> process 0
>>> to get more information on the crash.
>>> [1]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [1]PETSC ERROR: Signal received
>>> [1]PETSC ERROR: See
>>> http://http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>> shooting.
>>> [1]PETSC ERROR: Petsc Development GIT revision: v3.4.4-3713-g576f62e
>>> GIT Date: 2014-03-23 15:59:15 -0500
>>> [1]PETSC ERROR: ./ex9 on a linux_opt named sitre.intra.cea.fr by triou
>>> Wed Mar 26 18:32:34 2014
>>> [1]PETSC ERROR: Configure options
>>> --prefix=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt 
>>>
>>> --with-single-library --with-shared-libraries=0 --with-debugging=0
>>> --with-errorchecking=1 --COPTFLAGS="   -O3 -fPIC " --CXXOPTFLAGS=" -O3
>>> -fPIC " --FOPTFLAGS="  -O3 -fPIC " --with-fortran=yes --with-clean=1
>>> --download-scalapack=../scalapack-2.0.2.tgz
>>> --download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-superlu_dist=yes
>>> --download-parmetis=../parmetis-4.0.2-p4.tar.gz
>>> --download-metis=../metis-5.0.2-p3.tar.gz
>>> --download-ptscotch=../ptscotch.tar.gz
>>> --download-hypre=../hypre-2.9.1a.tar.gz
>>> --with-valgrind-include=/work/triou/git/petsc-dev/Trio_U/exec/valgrind/include 
>>>
>>> --with-blas-lapack-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBLAPACK 
>>> --with-cuda=1
>>> --with-cuda-dir=/work/triou/git/petsc-dev/Trio_U/exec/cuda5.5
>>> --with-cusp=1
>>> --with-cusp-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/cusplibrary-0.4.0 
>>>
>>> --with-thrust=1 --with-cuda-arch=sm_21 --with-ssl=0
>>> --with-mpi-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBMPI/mpich
>>> --with-x=1
>>> [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
>>>
>>>
>>>
>>> -- 
>>> *Trio_U support team*
>>> Marthe ROUX (01 69 08 00 02) Saclay
>>> Pierre LEDAC (04 38 78 91 49) Grenoble
>>
>
>
> -- 
> *Trio_U support team*
> Marthe ROUX (01 69 08 00 02) Saclay
> Pierre LEDAC (04 38 78 91 49) Grenoble


-- 
*Trio_U support team*
Marthe ROUX (01 69 08 00 02) Saclay
Pierre LEDAC (04 38 78 91 49) Grenoble
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140729/65dd830d/attachment.html>


More information about the petsc-dev mailing list