[petsc-dev] VecScatterInitializeForGPU

Wed Jan 22 12:52:43 CST 2014

On Wed 22 Jan 2014 10:54:28 AM MST, Paul Mullowney wrote:
> Oh. You're opening a can of worms but maybe that's your intent ;) I
> see the block Jacobi preconditioner in the valgrind logs.

Didn't mean to open a can of worms.

> Do,
> mpirun -n 1 (or 2) ./ex7 -mat_type mpiaijcusparse -vec_type mpicusp
> -pc_type none

This works.

> From here, we can try to sort out the VecScatterInitializeForGPU
> problem when mpirun/exec is not used.
> If you want to implement block jacobi preconditioner on multiple GPUs,
> that's a larger problem to solve. I had some code that sort of worked.
> We'd have to sit down and discuss.

I'd be really interested in learning more about this.

Cheers,
Dominic

> -Paul
>
>
> On Wed, Jan 22, 2014 at 10:48 AM, Dominic Meiser <dmeiser at txcorp.com
> <mailto:dmeiser at txcorp.com>> wrote:
>
>     Attached are the logs with 1 rank and 2 ranks. As far as I can
>     tell these are different errors.
>
>     For the log attached to the previous email I chose to run ex7
>     without mpirun so that valgrind checks ex7 and not mpirun. Is
>     there a way to have valgrind check the mpi processes rather than
>     mpirun?
>
>     Cheers,
>     Dominic
>
>
>
>     On 01/22/2014 10:37 AM, Paul Mullowney wrote:
>>     Hmmm. I may not have protected against the case where the
>>     mpaijcusp(arse) classes are called but without mpirun/mpiexec. I
>>     suppose it should have occurred to me that someone would do this.
>>     try :
>>     mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>>     In this scenario, the sequential to sequential vecscatters should
>>     be called.
>>     Then,
>>     mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp
>>     In this scenario, MPI_General vecscatters should be called ...
>>     and work correctly if you have a system with multiple GPUs.
>>     I
>>     -Paul
>>
>>
>>     On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser
>>     <dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>> wrote:
>>
>>         Hey Paul,
>>
>>         Thanks for providing background on this.
>>
>>
>>         On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:
>>
>>
>>             Dominic,
>>             A few years ago, I was trying to minimize the amount of
>>             data transfer
>>             to and from the GPU (for multi-GPU MatMult) by inspecting
>>             the indices
>>             of the data that needed to be message to and from the
>>             device. Then, I
>>             would call gather kernels on the GPU which pulled the
>>             scattered data
>>             into contiguous buffers and then be transferred to the host
>>             asynchronously (while the MatMult was occurring). The
>>             existence of
>>             VecScatterInitializeForGPU was added in order to build
>>             the necessary
>>             buffers as needed. This was the motivation behind the
>>             existence of
>>             VecScatterInitializeForGPU.
>>             An alternative approach is to message the smallest
>>             contiguous buffer
>>             containing all the data with a single cudaMemcpyAsync.
>>             This is the
>>             method currently implemented.
>>             I never found a case where the former implementation
>>             (with a GPU
>>             gather-kernel) performed better than the alternative
>>             approach which
>>             messaged the smallest contiguous buffer. I looked at
>>             many, many matrices.
>>             Now, as far as I understand the VecScatter kernels, this
>>             method should
>>             only get called if the transfer is MPI_General (i.e. PtoP
>>             parallel to
>>             parallel). Other VecScatter methods are called in other
>>             circumstances
>>             where the the scatter is not MPI_General. That assumption
>>             could be
>>             wrong though.
>>
>>
>>
>>         I see. I figured there was some logic in place to make sure
>>         that this function only gets called in cases where the
>>         transfer type is MPI_General. I'm getting segfaults in this
>>         function where the todata and fromdata are of a different
>>         type. This could easily be user error but I'm not sure. Here
>>         is an example valgrind error:
>>
>>         ==27781== Invalid read of size 8
>>         ==27781== at 0x1188080: VecScatterInitializeForGPU
>>         (vscatcusp.c:46)
>>         ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*,
>>         _p_Vec*, _p_Vec*) (mpiaijcusparse.cu:108
>>         <http://mpiaijcusparse.cu:108>)
>>         ==27781== by 0xA20CC3: MatMult (matrix.c:2242)
>>         ==27781== by 0x4645E4: main (ex7.c:93)
>>         ==27781== Address 0x286305e0 is 1,616 bytes inside a block of
>>         size 1,620 alloc'd
>>         ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
>>         ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int,
>>         char const*, char const*, void**) (mal.c:27)
>>         ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long,
>>         int, char const*, char const*, void**) (mtr.c:186)
>>         ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
>>         ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
>>         ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*,
>>         MatAssemblyType) (mpiaij.c:706)
>>         ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
>>         ==27781== by 0x464301: main (ex7.c:78)
>>
>>         This was produced by src/ksp/ksp/tutorials/ex7.c. The command
>>         line options are
>>
>>         ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>>
>>         In this particular case the todata is of type
>>         VecScatter_Seq_Stride and fromdata is of type
>>         VecScatter_Seq_General. The complete valgrind log (including
>>         configure options for petsc) is attached.
>>
>>         Any comments or suggestions are appreciated.
>>         Cheers,
>>         Dominic
>>
>>
>>             -Paul
>>
>>
>>             On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser
>>             <dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>
>>             <mailto:dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>>>
>>             wrote:
>>
>>             Hi,
>>
>>             I'm trying to understand VecScatterInitializeForGPU in
>>             src/vec/vec/utils/veccusp/__vscatcusp.c. I don't
>>             understand why
>>
>>             this function can get away with casting the fromdata and
>>             todata in
>>             the inctx to VecScatter_MPI_General. Don't we need to
>>             inspect the
>>             VecScatterType fields of the todata and fromdata?
>>
>>             Cheers,
>>             Dominic
>>
>>             --
>>             Dominic Meiser
>>             Tech-X Corporation
>>             5621 Arapahoe Avenue
>>             Boulder, CO 80303
>>             USA
>>             Telephone: 303-996-2036 <tel:303-996-2036>
>>             <tel:303-996-2036 <tel:303-996-2036>>
>>             Fax: 303-448-7756 <tel:303-448-7756> <tel:303-448-7756
>>             <tel:303-448-7756>>
>>             www.txcorp.com <http://www.txcorp.com>
>>             <http://www.txcorp.com>
>>
>>
>>
>>
>>
>>         --
>>         Dominic Meiser
>>         Tech-X Corporation
>>         5621 Arapahoe Avenue
>>         Boulder, CO 80303
>>         USA
>>         Telephone: 303-996-2036 <tel:303-996-2036>
>>         Fax: 303-448-7756 <tel:303-448-7756>
>>         www.txcorp.com <http://www.txcorp.com>
>>
>>
>
>
>     --
>     Dominic Meiser
>     Tech-X Corporation
>     5621 Arapahoe Avenue
>     Boulder, CO 80303
>     USA
>     Telephone:303-996-2036  <tel:303-996-2036>
>     Fax:303-448-7756  <tel:303-448-7756>
>     www.txcorp.com  <http://www.txcorp.com>
>
>

--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com