[petsc-dev] [GPU] src/ksp/ksp/examples/tutorials/ex9 fails with mpirun -np 2

Projet_TRIOU triou at cea.fr
Wed Mar 26 12:38:03 CDT 2014


Hello all,

My parallel code is really near to run using GPU device
thanks to PETSc but I am struggling with the best way
to fill the PETSc vectors in order in runs on CPU & GPU.

I was using in the past, VecCreateMPIWithArray, to
create vector but I had trouble with it on GPU.

So I followed the example in the src/ksp/ksp/examples/tutorials/ex2.c
and create now the vectors like this:

   // Build x
   ierr = VecCreate(PETSC_COMM_WORLD,&SecondMembrePetsc_); check(ierr);
   ierr = VecSetSizes(SecondMembrePetsc_, nb_rows, nb_rows_tot); 
check(ierr);
   ierr = VecSetFromOptions(SecondMembrePetsc_); check(ierr);
   // Build b
   ierr = VecDuplicate(SecondMembrePetsc_,&SolutionPetsc_);check(ierr);

And fills it with VecSetValues function. It runs well on CPU and
GPU but crashed only in parallel on GPU. It I use VecSet instead of
VecSetValues, it didn't crash (but of course VecSet is not enough
for me :-)

I tried to find an example to reproduce for you the problem, and I
think src/ksp/ksp/examples/tutorials/ex9 (it is usingVecSetValues,)
is a good one.

Or did I miss something (I also try VecPlaceArray/VecRestoreArray
but without success on GPU) ?

Thanks, and yes you are right,  "WARNING: Using GPUs effectively is 
difficult!" :-)

Pierre

sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
 > ./ex9 -ksp_monitor
   0 KSP Residual norm 6.612932697792e+00
   1 KSP Residual norm 4.261830032389e-01
   2 KSP Residual norm 2.121746090851e-02
   3 KSP Residual norm 1.233779841608e-03
   4 KSP Residual norm 1.265903168531e-05
   0 KSP Residual norm 1.309416176382e-05
   0 KSP Residual norm 1.404919664063e-05

sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
 > mpiexec -np 2 ./ex9 -ksp_monitor
   0 KSP Residual norm 2.496821857304e+02
   1 KSP Residual norm 4.522206074831e+01
   2 KSP Residual norm 1.959482408314e+01
   3 KSP Residual norm 7.002013703407e+00
   4 KSP Residual norm 2.144105201713e+00
   5 KSP Residual norm 1.780095080270e-01
   6 KSP Residual norm 5.642702243268e-02
   7 KSP Residual norm 6.439343992306e-03
   8 KSP Residual norm 3.012756374415e-04
Norm of error 0.000249108, Iterations 8
Norm of error 0.000715584, Iterations 6
   0 KSP Residual norm 3.422287562824e-04
Norm of error 0.000249108, Iterations 0
Norm of error 0.000192805, Iterations 7
   0 KSP Residual norm 4.140588954098e-04
Norm of error 0.000249108, Iterations 0
Norm of error 0.000109507, Iterations 7

sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
 > ./ex9 -ksp_monitor -mat_type aijcusp -vec_type cusp
   0 KSP Residual norm 6.612932697792e+00
   1 KSP Residual norm 4.261830032389e-01
   2 KSP Residual norm 2.121746090851e-02
   3 KSP Residual norm 1.233779841608e-03
   4 KSP Residual norm 1.265903168531e-05
   0 KSP Residual norm 1.309416176403e-05
   0 KSP Residual norm 1.404919664088e-05

sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials 
 > mpiexec -np 2 ./ex9 -ksp_monitor -mat_type aijcusp -vec_type cusp
[0]PETSC ERROR: 
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC 
ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See 
http://http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble 
shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.4-3713-g576f62e  
GIT Date: 2014-03-23 15:59:15 -0500
[0]PETSC ERROR: ./ex9 on a linux_opt named sitre.intra.cea.fr by triou 
Wed Mar 26 18:32:34 2014
[0]PETSC ERROR: Configure options 
--prefix=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt 
--with-single-library --with-shared-libraries=0 --with-debugging=0 
--with-errorchecking=1 --COPTFLAGS="   -O3 -fPIC " --CXXOPTFLAGS=" -O3 
-fPIC " --FOPTFLAGS="  -O3 -fPIC " --with-fortran=yes --with-clean=1 
--download-scalapack=../scalapack-2.0.2.tgz 
--download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-superlu_dist=yes 
--download-parmetis=../parmetis-4.0.2-p4.tar.gz 
--download-metis=../metis-5.0.2-p3.tar.gz 
--download-ptscotch=../ptscotch.tar.gz 
--download-hypre=../hypre-2.9.1a.tar.gz 
--with-valgrind-include=/work/triou/git/petsc-dev/Trio_U/exec/valgrind/include 
--with-blas-lapack-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBLAPACK --with-cuda=1 
--with-cuda-dir=/work/triou/git/petsc-dev/Trio_U/exec/cuda5.5 
--with-cusp=1 
--with-cusp-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/cusplibrary-0.4.0 
--with-thrust=1 --with-cuda-arch=sm_21 --with-ssl=0 
--with-mpi-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBMPI/mpich 
--with-x=1
[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
[1]PETSC ERROR: 
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see 
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC 
ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to 
find memory corruption errors
[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
and run
[1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
to get more information on the crash.
[1]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[1]PETSC ERROR: Signal received
[1]PETSC ERROR: See 
http://http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble 
shooting.
[1]PETSC ERROR: Petsc Development GIT revision: v3.4.4-3713-g576f62e  
GIT Date: 2014-03-23 15:59:15 -0500
[1]PETSC ERROR: ./ex9 on a linux_opt named sitre.intra.cea.fr by triou 
Wed Mar 26 18:32:34 2014
[1]PETSC ERROR: Configure options 
--prefix=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt 
--with-single-library --with-shared-libraries=0 --with-debugging=0 
--with-errorchecking=1 --COPTFLAGS="   -O3 -fPIC " --CXXOPTFLAGS=" -O3 
-fPIC " --FOPTFLAGS="  -O3 -fPIC " --with-fortran=yes --with-clean=1 
--download-scalapack=../scalapack-2.0.2.tgz 
--download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-superlu_dist=yes 
--download-parmetis=../parmetis-4.0.2-p4.tar.gz 
--download-metis=../metis-5.0.2-p3.tar.gz 
--download-ptscotch=../ptscotch.tar.gz 
--download-hypre=../hypre-2.9.1a.tar.gz 
--with-valgrind-include=/work/triou/git/petsc-dev/Trio_U/exec/valgrind/include 
--with-blas-lapack-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBLAPACK --with-cuda=1 
--with-cuda-dir=/work/triou/git/petsc-dev/Trio_U/exec/cuda5.5 
--with-cusp=1 
--with-cusp-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/cusplibrary-0.4.0 
--with-thrust=1 --with-cuda-arch=sm_21 --with-ssl=0 
--with-mpi-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBMPI/mpich 
--with-x=1
[1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1



-- 
*Trio_U support team*
Marthe ROUX (01 69 08 00 02) Saclay
Pierre LEDAC (04 38 78 91 49) Grenoble
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140326/f6c1e883/attachment.html>


More information about the petsc-dev mailing list