[petsc-dev] [GPU] src/ksp/ksp/examples/tutorials/ex9 fails with mpirun -np 2

Projet_TRIOU triou at cea.fr
Tue Jul 29 10:09:32 CDT 2014


Yes Matt, I am calling VecAssemblyBegin/End() after VecSetValues.

What brings me this idea is the ex30 exercice in 
src/ksp/ksp/examples/tests/ex30.c
where there was the read of a matrix, then VecSet, then VecSetValues 
eventually
(and it was working on GPU with mpiexec -np 2 ...). I played a lot with 
it to discover
VecSet was a turnaround to fix the crash of VecSetValues.

I can send my modified ex30.c test to reproduce the bug.

Pierre
> On Tue, Jul 29, 2014 at 9:58 AM, Projet_TRIOU <triou at cea.fr 
> <mailto:triou at cea.fr>> wrote:
>
>     Hello Karl,
>
>     For your interest, I understood why my code crashes on GPU with np
>     CPUs > 1.
>
>     I am solving Ax=B with KSPSetInitialGuessNonzero set to true so I need
>     also to fill x with VecSetValues before each solve.
>
>     VecSetValues works on GPU with np=1 but not for np>1. I noticed
>     that filling
>     x with VecSet worked fine so when I create x, I now initialize
>     also with VecSet.
>     (I suppose it copy/create something on GPU that VecSetValues doesn't):
>
>     Vec x;
>     VecCreate(PETSC_COMM_WORLD,&x);
>     VecSetSizes(x, nb_rows_, PETSC_DECIDE);
>     VecSetType(x, VECMPICUSP);
>     VecSet(x,0.0); // Needed on GPU during parallel calculation, else
>     it crashes
>
>     // Later:
>     for (int i=0; i<size; i++)
>     {
>             VecSetValues(b, 1, &colonne_globale, &secmem(i),
>     INSERT_VALUES);
>             VecSetValues(x, 1, &colonne_globale, &solution(i),
>     INSERT_VALUES);
>             colonne_globale++;
>     }
>
>
> Are you calling VecAssemblyBegin/End() after this? It is required for 
> VecSetValues().
>
>    Matt
>
>     Thanks for your help,
>
>     Pierre
>>     Thanks Karl, and sorry for my mistake. Indeed, ex9 runs in //
>>     with -pc_type none or -pc_type jacobi.
>>     Good to know I have an example now with VecSetValues I can try to
>>     replicate in my code.
>>     I will update the thread as soon as I return to some more tests
>>     on it.
>>
>>     Pierre
>>>     Hi Pierre,
>>>
>>>     I could reproduce the problem, but I could also verify that the
>>>     problem you ran into was due to a currently unsupported
>>>     block-Jacobi preconditioner. That is, if you first comment the
>>>     KSPSolve for the ksp2 object in ex9, recompile ex9, and then run
>>>
>>>     $> mpiexec -np 2 ./ex9 -ksp_monitor -mat_type aijcusp -vec_type
>>>     cusp -pc_type none
>>>
>>>     thinks should work out. This may also be applicable to your
>>>     case: Give it a try with -pc_type none and let us whether the
>>>     problem with VecSetValues remains. (The use of no preconditioner
>>>     is certainly not great, but at least it may give you a first
>>>     result)
>>>
>>>     Best regards,
>>>     Karli
>>>
>>>
>>>
>>>     On 03/26/2014 06:38 PM, Projet_TRIOU wrote:
>>>>     Hello all,
>>>>
>>>>     My parallel code is really near to run using GPU device
>>>>     thanks to PETSc but I am struggling with the best way
>>>>     to fill the PETSc vectors in order in runs on CPU & GPU.
>>>>
>>>>     I was using in the past, VecCreateMPIWithArray, to
>>>>     create vector but I had trouble with it on GPU.
>>>>
>>>>     So I followed the example in the
>>>>     src/ksp/ksp/examples/tutorials/ex2.c
>>>>     and create now the vectors like this:
>>>>
>>>>        // Build x
>>>>        ierr = VecCreate(PETSC_COMM_WORLD,&SecondMembrePetsc_);
>>>>     check(ierr);
>>>>        ierr = VecSetSizes(SecondMembrePetsc_, nb_rows, nb_rows_tot);
>>>>     check(ierr);
>>>>        ierr = VecSetFromOptions(SecondMembrePetsc_); check(ierr);
>>>>        // Build b
>>>>        ierr =
>>>>     VecDuplicate(SecondMembrePetsc_,&SolutionPetsc_);check(ierr);
>>>>
>>>>     And fills it with VecSetValues function. It runs well on CPU and
>>>>     GPU but crashed only in parallel on GPU. It I use VecSet
>>>>     instead of
>>>>     VecSetValues, it didn't crash (but of course VecSet is not enough
>>>>     for me :-)
>>>>
>>>>     I tried to find an example to reproduce for you the problem, and I
>>>>     think src/ksp/ksp/examples/tutorials/ex9 (it is
>>>>     usingVecSetValues,)
>>>>     is a good one.
>>>>
>>>>     Or did I miss something (I also try VecPlaceArray/VecRestoreArray
>>>>     but without success on GPU) ?
>>>>
>>>>     Thanks, and yes you are right,  "WARNING: Using GPUs
>>>>     effectively is
>>>>     difficult!" :-)
>>>>
>>>>     Pierre
>>>>
>>>>     sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials
>>>>
>>>>      > ./ex9 -ksp_monitor
>>>>        0 KSP Residual norm 6.612932697792e+00
>>>>        1 KSP Residual norm 4.261830032389e-01
>>>>        2 KSP Residual norm 2.121746090851e-02
>>>>        3 KSP Residual norm 1.233779841608e-03
>>>>        4 KSP Residual norm 1.265903168531e-05
>>>>        0 KSP Residual norm 1.309416176382e-05
>>>>        0 KSP Residual norm 1.404919664063e-05
>>>>
>>>>     sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials
>>>>
>>>>      > mpiexec -np 2 ./ex9 -ksp_monitor
>>>>        0 KSP Residual norm 2.496821857304e+02
>>>>        1 KSP Residual norm 4.522206074831e+01
>>>>        2 KSP Residual norm 1.959482408314e+01
>>>>        3 KSP Residual norm 7.002013703407e+00
>>>>        4 KSP Residual norm 2.144105201713e+00
>>>>        5 KSP Residual norm 1.780095080270e-01
>>>>        6 KSP Residual norm 5.642702243268e-02
>>>>        7 KSP Residual norm 6.439343992306e-03
>>>>        8 KSP Residual norm 3.012756374415e-04
>>>>     Norm of error 0.000249108, Iterations 8
>>>>     Norm of error 0.000715584, Iterations 6
>>>>        0 KSP Residual norm 3.422287562824e-04
>>>>     Norm of error 0.000249108, Iterations 0
>>>>     Norm of error 0.000192805, Iterations 7
>>>>        0 KSP Residual norm 4.140588954098e-04
>>>>     Norm of error 0.000249108, Iterations 0
>>>>     Norm of error 0.000109507, Iterations 7
>>>>
>>>>     sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials
>>>>
>>>>      > ./ex9 -ksp_monitor -mat_type aijcusp -vec_type cusp
>>>>        0 KSP Residual norm 6.612932697792e+00
>>>>        1 KSP Residual norm 4.261830032389e-01
>>>>        2 KSP Residual norm 2.121746090851e-02
>>>>        3 KSP Residual norm 1.233779841608e-03
>>>>        4 KSP Residual norm 1.265903168531e-05
>>>>        0 KSP Residual norm 1.309416176403e-05
>>>>        0 KSP Residual norm 1.404919664088e-05
>>>>
>>>>     sitre.intra.cea.fr:/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt/src/ksp/ksp/examples/tutorials
>>>>
>>>>      > mpiexec -np 2 ./ex9 -ksp_monitor -mat_type aijcusp -vec_type
>>>>     cusp
>>>>     [0]PETSC ERROR:
>>>>     ------------------------------------------------------------------------
>>>>
>>>>     [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>     Violation,
>>>>     probably memory access out of range
>>>>     [0]PETSC ERROR: Try option -start_in_debugger or
>>>>     -on_error_attach_debugger
>>>>     [0]PETSC ERROR: or see
>>>>     http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC
>>>>
>>>>     ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
>>>>     X to
>>>>     find memory corruption errors
>>>>     [0]PETSC ERROR: configure using --with-debugging=yes,
>>>>     recompile, link,
>>>>     and run
>>>>     [0]PETSC ERROR: to get more information on the crash.
>>>>     [0]PETSC ERROR: --------------------- Error Message
>>>>     --------------------------------------------------------------
>>>>     [0]PETSC ERROR: Signal received
>>>>     [0]PETSC ERROR: See
>>>>     http://http://www.mcs.anl.gov/petsc/documentation/faq.html for
>>>>     trouble
>>>>     shooting.
>>>>     [0]PETSC ERROR: Petsc Development GIT revision:
>>>>     v3.4.4-3713-g576f62e
>>>>     GIT Date: 2014-03-23 15:59:15 -0500
>>>>     [0]PETSC ERROR: ./ex9 on a linux_opt named sitre.intra.cea.fr
>>>>     <http://sitre.intra.cea.fr> by triou
>>>>     Wed Mar 26 18:32:34 2014
>>>>     [0]PETSC ERROR: Configure options
>>>>     --prefix=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt
>>>>
>>>>     --with-single-library --with-shared-libraries=0 --with-debugging=0
>>>>     --with-errorchecking=1 --COPTFLAGS="   -O3 -fPIC "
>>>>     --CXXOPTFLAGS=" -O3
>>>>     -fPIC " --FOPTFLAGS="  -O3 -fPIC " --with-fortran=yes
>>>>     --with-clean=1
>>>>     --download-scalapack=../scalapack-2.0.2.tgz
>>>>     --download-mumps=../MUMPS_4.10.0-p3.tar.gz
>>>>     --download-superlu_dist=yes
>>>>     --download-parmetis=../parmetis-4.0.2-p4.tar.gz
>>>>     --download-metis=../metis-5.0.2-p3.tar.gz
>>>>     --download-ptscotch=../ptscotch.tar.gz
>>>>     --download-hypre=../hypre-2.9.1a.tar.gz
>>>>     --with-valgrind-include=/work/triou/git/petsc-dev/Trio_U/exec/valgrind/include
>>>>
>>>>     --with-blas-lapack-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBLAPACK
>>>>     --with-cuda=1
>>>>     --with-cuda-dir=/work/triou/git/petsc-dev/Trio_U/exec/cuda5.5
>>>>     --with-cusp=1
>>>>     --with-cusp-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/cusplibrary-0.4.0
>>>>
>>>>     --with-thrust=1 --with-cuda-arch=sm_21 --with-ssl=0
>>>>     --with-mpi-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBMPI/mpich
>>>>
>>>>     --with-x=1
>>>>     [0]PETSC ERROR: #1 User provided function() line 0 in  unknown
>>>>     file
>>>>     [1]PETSC ERROR:
>>>>     ------------------------------------------------------------------------
>>>>
>>>>     [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>     Violation,
>>>>     probably memory access out of range
>>>>     [1]PETSC ERROR: Try option -start_in_debugger or
>>>>     -on_error_attach_debugger
>>>>     [1]PETSC ERROR: or see
>>>>     http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
>>>>
>>>>     ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
>>>>     X to
>>>>     find memory corruption errors
>>>>     [1]PETSC ERROR: configure using --with-debugging=yes,
>>>>     recompile, link,
>>>>     and run
>>>>     [1]PETSC ERROR: application called MPI_Abort(MPI_COMM_WORLD,
>>>>     59) - process 0
>>>>     to get more information on the crash.
>>>>     [1]PETSC ERROR: --------------------- Error Message
>>>>     --------------------------------------------------------------
>>>>     [1]PETSC ERROR: Signal received
>>>>     [1]PETSC ERROR: See
>>>>     http://http://www.mcs.anl.gov/petsc/documentation/faq.html for
>>>>     trouble
>>>>     shooting.
>>>>     [1]PETSC ERROR: Petsc Development GIT revision:
>>>>     v3.4.4-3713-g576f62e
>>>>     GIT Date: 2014-03-23 15:59:15 -0500
>>>>     [1]PETSC ERROR: ./ex9 on a linux_opt named sitre.intra.cea.fr
>>>>     <http://sitre.intra.cea.fr> by triou
>>>>     Wed Mar 26 18:32:34 2014
>>>>     [1]PETSC ERROR: Configure options
>>>>     --prefix=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/petsc/linux_opt
>>>>
>>>>     --with-single-library --with-shared-libraries=0 --with-debugging=0
>>>>     --with-errorchecking=1 --COPTFLAGS="   -O3 -fPIC "
>>>>     --CXXOPTFLAGS=" -O3
>>>>     -fPIC " --FOPTFLAGS="  -O3 -fPIC " --with-fortran=yes
>>>>     --with-clean=1
>>>>     --download-scalapack=../scalapack-2.0.2.tgz
>>>>     --download-mumps=../MUMPS_4.10.0-p3.tar.gz
>>>>     --download-superlu_dist=yes
>>>>     --download-parmetis=../parmetis-4.0.2-p4.tar.gz
>>>>     --download-metis=../metis-5.0.2-p3.tar.gz
>>>>     --download-ptscotch=../ptscotch.tar.gz
>>>>     --download-hypre=../hypre-2.9.1a.tar.gz
>>>>     --with-valgrind-include=/work/triou/git/petsc-dev/Trio_U/exec/valgrind/include
>>>>
>>>>     --with-blas-lapack-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBLAPACK
>>>>     --with-cuda=1
>>>>     --with-cuda-dir=/work/triou/git/petsc-dev/Trio_U/exec/cuda5.5
>>>>     --with-cusp=1
>>>>     --with-cusp-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBPETSC/cusplibrary-0.4.0
>>>>
>>>>     --with-thrust=1 --with-cuda-arch=sm_21 --with-ssl=0
>>>>     --with-mpi-dir=/work/triou/git/petsc-dev/Trio_U/lib/src/LIBMPI/mpich
>>>>
>>>>     --with-x=1
>>>>     [1]PETSC ERROR: #1 User provided function() line 0 in  unknown
>>>>     file
>>>>     application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     *Trio_U support team*
>>>>     Marthe ROUX (01 69 08 00 02) Saclay
>>>>     Pierre LEDAC (04 38 78 91 49) Grenoble
>>>
>>
>>
>>     -- 
>>     *Trio_U support team*
>>     Marthe ROUX (01 69 08 00 02) Saclay
>>     Pierre LEDAC (04 38 78 91 49) Grenoble
>
>
>     -- 
>     *Trio_U support team*
>     Marthe ROUX (01 69 08 00 02) Saclay
>     Pierre LEDAC (04 38 78 91 49) Grenoble
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener


-- 
*Trio_U support team*
Marthe ROUX (01 69 08 00 02) Saclay
Pierre LEDAC (04 38 78 91 49) Grenoble
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140729/cb7e9e33/attachment.html>


More information about the petsc-dev mailing list