[petsc-dev] VecScatterInitializeForGPU

Dominic Meiser dmeiser at txcorp.com
Wed Jan 22 11:48:42 CST 2014


Attached are the logs with 1 rank and 2 ranks. As far as I can tell 
these are different errors.

For the log attached to the previous email I chose to run ex7 without 
mpirun so that valgrind checks ex7 and not mpirun. Is there a way to 
have valgrind check the mpi processes rather than mpirun?

Cheers,
Dominic


On 01/22/2014 10:37 AM, Paul Mullowney wrote:
> Hmmm. I may not have protected against the case where the 
> mpaijcusp(arse) classes are called but without mpirun/mpiexec. I 
> suppose it should have occurred to me that someone would do this.
> try :
> mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp
> In this scenario, the sequential to sequential vecscatters should be 
> called.
> Then,
> mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp
> In this scenario, MPI_General vecscatters should be called ... and 
> work correctly if you have a system with multiple GPUs.
> I
> -Paul
>
>
> On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser <dmeiser at txcorp.com 
> <mailto:dmeiser at txcorp.com>> wrote:
>
>     Hey Paul,
>
>     Thanks for providing background on this.
>
>
>     On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:
>
>
>         Dominic,
>         A few years ago, I was trying to minimize the amount of data
>         transfer
>         to and from the GPU (for multi-GPU MatMult) by inspecting the
>         indices
>         of the data that needed to be message to and from the device.
>         Then, I
>         would call gather kernels on the GPU which pulled the
>         scattered data
>         into contiguous buffers and then be transferred to the host
>         asynchronously (while the MatMult was occurring). The existence of
>         VecScatterInitializeForGPU was added in order to build the
>         necessary
>         buffers as needed. This was the motivation behind the existence of
>         VecScatterInitializeForGPU.
>         An alternative approach is to message the smallest contiguous
>         buffer
>         containing all the data with a single cudaMemcpyAsync. This is the
>         method currently implemented.
>         I never found a case where the former implementation (with a GPU
>         gather-kernel) performed better than the alternative approach
>         which
>         messaged the smallest contiguous buffer. I looked at many,
>         many matrices.
>         Now, as far as I understand the VecScatter kernels, this
>         method should
>         only get called if the transfer is MPI_General (i.e. PtoP
>         parallel to
>         parallel). Other VecScatter methods are called in other
>         circumstances
>         where the the scatter is not MPI_General. That assumption could be
>         wrong though.
>
>
>
>     I see. I figured there was some logic in place to make sure that
>     this function only gets called in cases where the transfer type is
>     MPI_General. I'm getting segfaults in this function where the
>     todata and fromdata are of a different type. This could easily be
>     user error but I'm not sure. Here is an example valgrind error:
>
>     ==27781== Invalid read of size 8
>     ==27781== at 0x1188080: VecScatterInitializeForGPU (vscatcusp.c:46)
>     ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*,
>     _p_Vec*) (mpiaijcusparse.cu:108 <http://mpiaijcusparse.cu:108>)
>     ==27781== by 0xA20CC3: MatMult (matrix.c:2242)
>     ==27781== by 0x4645E4: main (ex7.c:93)
>     ==27781== Address 0x286305e0 is 1,616 bytes inside a block of size
>     1,620 alloc'd
>     ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
>     ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int, char
>     const*, char const*, void**) (mal.c:27)
>     ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long, int,
>     char const*, char const*, void**) (mtr.c:186)
>     ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
>     ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
>     ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*,
>     MatAssemblyType) (mpiaij.c:706)
>     ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
>     ==27781== by 0x464301: main (ex7.c:78)
>
>     This was produced by src/ksp/ksp/tutorials/ex7.c. The command line
>     options are
>
>     ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>
>     In this particular case the todata is of type
>     VecScatter_Seq_Stride and fromdata is of type
>     VecScatter_Seq_General. The complete valgrind log (including
>     configure options for petsc) is attached.
>
>     Any comments or suggestions are appreciated.
>     Cheers,
>     Dominic
>
>
>         -Paul
>
>
>         On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser
>         <dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>
>         <mailto:dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>>> wrote:
>
>         Hi,
>
>         I'm trying to understand VecScatterInitializeForGPU in
>         src/vec/vec/utils/veccusp/__vscatcusp.c. I don't understand why
>
>         this function can get away with casting the fromdata and todata in
>         the inctx to VecScatter_MPI_General. Don't we need to inspect the
>         VecScatterType fields of the todata and fromdata?
>
>         Cheers,
>         Dominic
>
>         -- 
>         Dominic Meiser
>         Tech-X Corporation
>         5621 Arapahoe Avenue
>         Boulder, CO 80303
>         USA
>         Telephone: 303-996-2036 <tel:303-996-2036> <tel:303-996-2036
>         <tel:303-996-2036>>
>         Fax: 303-448-7756 <tel:303-448-7756> <tel:303-448-7756
>         <tel:303-448-7756>>
>         www.txcorp.com <http://www.txcorp.com> <http://www.txcorp.com>
>
>
>
>
>
>     -- 
>     Dominic Meiser
>     Tech-X Corporation
>     5621 Arapahoe Avenue
>     Boulder, CO 80303
>     USA
>     Telephone: 303-996-2036 <tel:303-996-2036>
>     Fax: 303-448-7756 <tel:303-448-7756>
>     www.txcorp.com <http://www.txcorp.com>
>
>


-- 
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140122/388f3a16/attachment.html>
-------------- next part --------------
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Null argument, when expecting valid pointer!
[0]PETSC ERROR: Trying to zero at a null pointer!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.3-2332-g54f71ec  GIT Date: 2014-01-20 14:12:11 -0700
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ./ex7 on a pargpudbg named ivy.txcorp.com by dmeiser Wed Jan 22 10:47:22 2014
[0]PETSC ERROR: Libraries linked from /scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg/lib
[0]PETSC ERROR: Configure run at Tue Jan 21 16:53:42 2014
[0]PETSC ERROR: Configure options --with-cmake=/scr_ivy/dmeiser/PTSOLVE/cmake/bin/cmake --prefix=/scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg --with-precision=double --with-scalar-type=real --with-fortran-kernels=1 --with-x=no --with-mpi=yes --with-mpi-dir=/scr_ivy/dmeiser/PTSOLVE/openmpi/ --with-openmp=yes --with-valgrind=1 --with-shared-libraries=0 --with-c-support=yes --with-debugging=yes --with-cuda=1 --with-cuda-dir=/usr/local/cuda --with-cuda-arch=sm_35 --download-txpetscgpu --with-thrust=yes --with-thrust-dir=/usr/local/cuda/include --with-umfpack=yes --download-umfpack --with-mumps=yes --with-superlu=yes --download-superlu=yes --download-mumps=yes --download-scalapack --download-parmetis --download-metis --with-cusp=yes --with-cusp-dir=/scr_ivy/dmeiser/PTSOLVE/cusp/include --CUDAFLAGS="-O3 -I/usr/local/cuda/include   --generate-code arch=compute_20,code=sm_20   --generate-code arch=compute_20,code=sm_21   --generate-code arch=compute_30,code=sm_30   --generate-code arch=compute_35,code=sm_35" --with-clanguage=C++ --CFLAGS="-pipe -fPIC" --CXXFLAGS="-pipe -fPIC" --with-c2html=0 --with-gelus=1 --with-gelus-dir=/scr_ivy/dmeiser/software/gelus
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: PetscMemzero() line 1930 in /scr_ivy/dmeiser/petsc/include/petscsys.h
[0]PETSC ERROR: VecSet_Seq() line 729 in /scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/dvec2.c
[0]PETSC ERROR: VecSet() line 575 in /scr_ivy/dmeiser/petsc/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: KSPSolve() line 417 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: PCApply_BJacobi_Multiblock() line 945 in /scr_ivy/dmeiser/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c
[0]PETSC ERROR: PCApply() line 440 in /scr_ivy/dmeiser/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: KSP_PCApply() line 227 in /scr_ivy/dmeiser/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: KSPInitialResidual() line 64 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: KSPSolve_GMRES() line 234 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/impls/gmres/gmres.c
[1]PETSC ERROR: --------------------- Error Message ------------------------------------
[1]PETSC ERROR: Null argument, when expecting valid pointer!
[1]PETSC ERROR: Trying to zero at a null pointer!
[1]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: Petsc Development GIT revision: v3.4.3-2332-g54f71ec  GIT Date: 2014-01-20 14:12:11 -0700
[1]PETSC ERROR: See docs/changes/index.html for recent updates.
[1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[1]PETSC ERROR: See docs/index.html for manual pages.
[1]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: ./ex7 on a pargpudbg named ivy.txcorp.com by dmeiser Wed Jan 22 10:47:22 2014
[1]PETSC ERROR: Libraries linked from /scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg/lib
[1]PETSC ERROR: Configure run at Tue Jan 21 16:53:42 2014
[1]PETSC ERROR: Configure options --with-cmake=/scr_ivy/dmeiser/PTSOLVE/cmake/bin/cmake --prefix=/scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg --with-precision=double --with-scalar-type=real --with-fortran-kernels=1 --with-x=no --with-mpi=yes --with-mpi-dir=/scr_ivy/dmeiser/PTSOLVE/openmpi/ --with-openmp=yes --with-valgrind=1 --with-shared-libraries=0 --with-c-support=yes --with-debugging=yes --with-cuda=1 --with-cuda-dir=/usr/local/cuda --with-cuda-arch=sm_35 --download-txpetscgpu --with-thrust=yes --with-thrust-dir=/usr/local/cuda/include --with-umfpack=yes --download-umfpack --with-mumps=yes --with-superlu=yes --download-superlu=yes --download-mumps=yes --download-scalapack --download-parmetis --download-metis --with-cusp=yes --with-cusp-dir=/scr_ivy/dmeiser/PTSOLVE/cusp/include --CUDAFLAGS="-O3 -I/usr/local/cuda/include   --generate-code arch=compute_20,code=sm_20   --generate-code arch=compute_20,code=sm_21   --generate-code arch=compute_30,code=sm_30   --generate-code arch=compute_35,code=sm_35" --with-clanguage=C++ --CFLAGS="-pipe -fPIC" --CXXFLAGS="-pipe -fPIC" --with-c2html=0 --with-gelus=1 --with-gelus-dir=/scr_ivy/dmeiser/software/gelus
[1]PETSC ERROR: ------------------------------------------------------------------------
[1]PETSC ERROR: PetscMemzero() line 1930 in /scr_ivy/dmeiser/petsc/include/petscsys.h
[1]PETSC ERROR: VecSet_Seq() line 729 in /scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/dvec2.c
[1]PETSC ERROR: VecSet() line 575 in /scr_ivy/dmeiser/petsc/src/vec/vec/interface/rvector.c
[1]PETSC ERROR: KSPSolve() line 417 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[1]PETSC ERROR: PCApply_BJacobi_Multiblock() line 945 in /scr_ivy/dmeiser/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c
[1]PETSC ERROR: PCApply() line 440 in /scr_ivy/dmeiser/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: KSPSolve() line 432 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: main() line 209 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/examples/tutorials/ex7.c
[1]PETSC ERROR: KSP_PCApply() line 227 in /scr_ivy/dmeiser/petsc/include/petsc-private/kspimpl.h
[1]PETSC ERROR: KSPInitialResidual() line 64 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itres.c
[1]PETSC ERROR: KSPSolve_GMRES() line 234 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/impls/gmres/gmres.c
[1]PETSC ERROR: KSPSolve() line 432 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[1]PETSC ERROR: main() line 209 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/examples/tutorials/ex7.c
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 85.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 27839 on
node ivy.txcorp.com exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[ivy.txcorp.com:27838] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[ivy.txcorp.com:27838] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
-------------- next part --------------
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Error in external library!
[0]PETSC ERROR: CUSP error 61!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.3-2332-g54f71ec  GIT Date: 2014-01-20 14:12:11 -0700
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: ./ex7 on a pargpudbg named ivy.txcorp.com by dmeiser Wed Jan 22 10:47:15 2014
[0]PETSC ERROR: Libraries linked from /scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg/lib
[0]PETSC ERROR: Configure run at Tue Jan 21 16:53:42 2014
[0]PETSC ERROR: Configure options --with-cmake=/scr_ivy/dmeiser/PTSOLVE/cmake/bin/cmake --prefix=/scr_ivy/dmeiser/petsc-gpu-dev/build/pargpudbg --with-precision=double --with-scalar-type=real --with-fortran-kernels=1 --with-x=no --with-mpi=yes --with-mpi-dir=/scr_ivy/dmeiser/PTSOLVE/openmpi/ --with-openmp=yes --with-valgrind=1 --with-shared-libraries=0 --with-c-support=yes --with-debugging=yes --with-cuda=1 --with-cuda-dir=/usr/local/cuda --with-cuda-arch=sm_35 --download-txpetscgpu --with-thrust=yes --with-thrust-dir=/usr/local/cuda/include --with-umfpack=yes --download-umfpack --with-mumps=yes --with-superlu=yes --download-superlu=yes --download-mumps=yes --download-scalapack --download-parmetis --download-metis --with-cusp=yes --with-cusp-dir=/scr_ivy/dmeiser/PTSOLVE/cusp/include --CUDAFLAGS="-O3 -I/usr/local/cuda/include   --generate-code arch=compute_20,code=sm_20   --generate-code arch=compute_20,code=sm_21   --generate-code arch=compute_30,code=sm_30   --generate-code arch=compute_35,code=sm_35" --with-clanguage=C++ --CFLAGS="-pipe -fPIC" --CXXFLAGS="-pipe -fPIC" --with-c2html=0 --with-gelus=1 --with-gelus-dir=/scr_ivy/dmeiser/software/gelus
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: VecCUSPAllocateCheck() line 72 in /scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecCUSPCopyToGPU() line 96 in /scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecCUSPGetArrayReadWrite() line 1946 in /scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecAXPBYPCZ_SeqCUSP() line 1507 in /scr_ivy/dmeiser/petsc/src/vec/vec/impls/seq/seqcusp/veccusp.cu
[0]PETSC ERROR: VecAXPBYPCZ() line 726 in /scr_ivy/dmeiser/petsc/src/vec/vec/interface/rvector.c
[0]PETSC ERROR: KSPSolve_BCGS() line 120 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/impls/bcgs/bcgs.c
[0]PETSC ERROR: KSPSolve() line 432 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: PCApply_BJacobi_Multiblock() line 945 in /scr_ivy/dmeiser/petsc/src/ksp/pc/impls/bjacobi/bjacobi.c
[0]PETSC ERROR: PCApply() line 440 in /scr_ivy/dmeiser/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: KSP_PCApply() line 227 in /scr_ivy/dmeiser/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: KSPInitialResidual() line 64 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: KSPSolve_GMRES() line 234 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: KSPSolve() line 432 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: main() line 209 in /scr_ivy/dmeiser/petsc/src/ksp/ksp/examples/tutorials/ex7.c
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 76.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 27836 on
node ivy.txcorp.com exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------


More information about the petsc-dev mailing list