[petsc-dev] VecScatterInitializeForGPU
Paul Mullowney
paulmullowney at gmail.com
Wed Jan 22 11:37:54 CST 2014
Hmmm. I may not have protected against the case where the mpaijcusp(arse)
classes are called but without mpirun/mpiexec. I suppose it should have
occurred to me that someone would do this.
try :
mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp
In this scenario, the sequential to sequential vecscatters should be called.
Then,
mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp
In this scenario, MPI_General vecscatters should be called ... and work
correctly if you have a system with multiple GPUs.
I
-Paul
On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser <dmeiser at txcorp.com> wrote:
> Hey Paul,
>
> Thanks for providing background on this.
>
>
> On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:
>
>>
>> Dominic,
>> A few years ago, I was trying to minimize the amount of data transfer
>> to and from the GPU (for multi-GPU MatMult) by inspecting the indices
>> of the data that needed to be message to and from the device. Then, I
>> would call gather kernels on the GPU which pulled the scattered data
>> into contiguous buffers and then be transferred to the host
>> asynchronously (while the MatMult was occurring). The existence of
>> VecScatterInitializeForGPU was added in order to build the necessary
>> buffers as needed. This was the motivation behind the existence of
>> VecScatterInitializeForGPU.
>> An alternative approach is to message the smallest contiguous buffer
>> containing all the data with a single cudaMemcpyAsync. This is the
>> method currently implemented.
>> I never found a case where the former implementation (with a GPU
>> gather-kernel) performed better than the alternative approach which
>> messaged the smallest contiguous buffer. I looked at many, many matrices.
>> Now, as far as I understand the VecScatter kernels, this method should
>> only get called if the transfer is MPI_General (i.e. PtoP parallel to
>> parallel). Other VecScatter methods are called in other circumstances
>> where the the scatter is not MPI_General. That assumption could be
>> wrong though.
>>
>
>
> I see. I figured there was some logic in place to make sure that this
> function only gets called in cases where the transfer type is MPI_General.
> I'm getting segfaults in this function where the todata and fromdata are of
> a different type. This could easily be user error but I'm not sure. Here is
> an example valgrind error:
>
> ==27781== Invalid read of size 8
> ==27781== at 0x1188080: VecScatterInitializeForGPU (vscatcusp.c:46)
> ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*, _p_Vec*, _p_Vec*) (
> mpiaijcusparse.cu:108)
> ==27781== by 0xA20CC3: MatMult (matrix.c:2242)
> ==27781== by 0x4645E4: main (ex7.c:93)
> ==27781== Address 0x286305e0 is 1,616 bytes inside a block of size 1,620
> alloc'd
> ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
> ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int, char const*,
> char const*, void**) (mal.c:27)
> ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long, int, char
> const*, char const*, void**) (mtr.c:186)
> ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
> ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
> ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*, MatAssemblyType)
> (mpiaij.c:706)
> ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
> ==27781== by 0x464301: main (ex7.c:78)
>
> This was produced by src/ksp/ksp/tutorials/ex7.c. The command line options
> are
>
> ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>
> In this particular case the todata is of type VecScatter_Seq_Stride and
> fromdata is of type VecScatter_Seq_General. The complete valgrind log
> (including configure options for petsc) is attached.
>
> Any comments or suggestions are appreciated.
> Cheers,
> Dominic
>
>
>> -Paul
>>
>>
>> On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser <dmeiser at txcorp.com
>> <mailto:dmeiser at txcorp.com>> wrote:
>>
>> Hi,
>>
>> I'm trying to understand VecScatterInitializeForGPU in
>> src/vec/vec/utils/veccusp/__vscatcusp.c. I don't understand why
>>
>> this function can get away with casting the fromdata and todata in
>> the inctx to VecScatter_MPI_General. Don't we need to inspect the
>> VecScatterType fields of the todata and fromdata?
>>
>> Cheers,
>> Dominic
>>
>> --
>> Dominic Meiser
>> Tech-X Corporation
>> 5621 Arapahoe Avenue
>> Boulder, CO 80303
>> USA
>> Telephone: 303-996-2036 <tel:303-996-2036>
>> Fax: 303-448-7756 <tel:303-448-7756>
>> www.txcorp.com <http://www.txcorp.com>
>>
>>
>>
>
>
> --
> Dominic Meiser
> Tech-X Corporation
> 5621 Arapahoe Avenue
> Boulder, CO 80303
> USA
> Telephone: 303-996-2036
> Fax: 303-448-7756
> www.txcorp.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140122/41553408/attachment.html>
More information about the petsc-dev
mailing list