[petsc-dev] VecScatterInitializeForGPU
Dominic Meiser
dmeiser at txcorp.com
Wed Jan 22 12:52:43 CST 2014
On Wed 22 Jan 2014 10:54:28 AM MST, Paul Mullowney wrote:
> Oh. You're opening a can of worms but maybe that's your intent ;) I
> see the block Jacobi preconditioner in the valgrind logs.
Didn't mean to open a can of worms.
> Do,
> mpirun -n 1 (or 2) ./ex7 -mat_type mpiaijcusparse -vec_type mpicusp
> -pc_type none
This works.
> From here, we can try to sort out the VecScatterInitializeForGPU
> problem when mpirun/exec is not used.
> If you want to implement block jacobi preconditioner on multiple GPUs,
> that's a larger problem to solve. I had some code that sort of worked.
> We'd have to sit down and discuss.
I'd be really interested in learning more about this.
Cheers,
Dominic
> -Paul
>
>
> On Wed, Jan 22, 2014 at 10:48 AM, Dominic Meiser <dmeiser at txcorp.com
> <mailto:dmeiser at txcorp.com>> wrote:
>
> Attached are the logs with 1 rank and 2 ranks. As far as I can
> tell these are different errors.
>
> For the log attached to the previous email I chose to run ex7
> without mpirun so that valgrind checks ex7 and not mpirun. Is
> there a way to have valgrind check the mpi processes rather than
> mpirun?
>
> Cheers,
> Dominic
>
>
>
> On 01/22/2014 10:37 AM, Paul Mullowney wrote:
>> Hmmm. I may not have protected against the case where the
>> mpaijcusp(arse) classes are called but without mpirun/mpiexec. I
>> suppose it should have occurred to me that someone would do this.
>> try :
>> mpirun -n 1 ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>> In this scenario, the sequential to sequential vecscatters should
>> be called.
>> Then,
>> mpirun -n 2 ../ex7 -mat_type mpiaijcusparse -vec_type cusp
>> In this scenario, MPI_General vecscatters should be called ...
>> and work correctly if you have a system with multiple GPUs.
>> I
>> -Paul
>>
>>
>> On Wed, Jan 22, 2014 at 10:32 AM, Dominic Meiser
>> <dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>> wrote:
>>
>> Hey Paul,
>>
>> Thanks for providing background on this.
>>
>>
>> On Wed 22 Jan 2014 10:05:13 AM MST, Paul Mullowney wrote:
>>
>>
>> Dominic,
>> A few years ago, I was trying to minimize the amount of
>> data transfer
>> to and from the GPU (for multi-GPU MatMult) by inspecting
>> the indices
>> of the data that needed to be message to and from the
>> device. Then, I
>> would call gather kernels on the GPU which pulled the
>> scattered data
>> into contiguous buffers and then be transferred to the host
>> asynchronously (while the MatMult was occurring). The
>> existence of
>> VecScatterInitializeForGPU was added in order to build
>> the necessary
>> buffers as needed. This was the motivation behind the
>> existence of
>> VecScatterInitializeForGPU.
>> An alternative approach is to message the smallest
>> contiguous buffer
>> containing all the data with a single cudaMemcpyAsync.
>> This is the
>> method currently implemented.
>> I never found a case where the former implementation
>> (with a GPU
>> gather-kernel) performed better than the alternative
>> approach which
>> messaged the smallest contiguous buffer. I looked at
>> many, many matrices.
>> Now, as far as I understand the VecScatter kernels, this
>> method should
>> only get called if the transfer is MPI_General (i.e. PtoP
>> parallel to
>> parallel). Other VecScatter methods are called in other
>> circumstances
>> where the the scatter is not MPI_General. That assumption
>> could be
>> wrong though.
>>
>>
>>
>> I see. I figured there was some logic in place to make sure
>> that this function only gets called in cases where the
>> transfer type is MPI_General. I'm getting segfaults in this
>> function where the todata and fromdata are of a different
>> type. This could easily be user error but I'm not sure. Here
>> is an example valgrind error:
>>
>> ==27781== Invalid read of size 8
>> ==27781== at 0x1188080: VecScatterInitializeForGPU
>> (vscatcusp.c:46)
>> ==27781== by 0xEEAE5D: MatMult_MPIAIJCUSPARSE(_p_Mat*,
>> _p_Vec*, _p_Vec*) (mpiaijcusparse.cu:108
>> <http://mpiaijcusparse.cu:108>)
>> ==27781== by 0xA20CC3: MatMult (matrix.c:2242)
>> ==27781== by 0x4645E4: main (ex7.c:93)
>> ==27781== Address 0x286305e0 is 1,616 bytes inside a block of
>> size 1,620 alloc'd
>> ==27781== at 0x4C26548: memalign (vg_replace_malloc.c:727)
>> ==27781== by 0x4654F9: PetscMallocAlign(unsigned long, int,
>> char const*, char const*, void**) (mal.c:27)
>> ==27781== by 0xCAEECC: PetscTrMallocDefault(unsigned long,
>> int, char const*, char const*, void**) (mtr.c:186)
>> ==27781== by 0x5A5296: VecScatterCreate (vscat.c:1168)
>> ==27781== by 0x9AF3C5: MatSetUpMultiply_MPIAIJ (mmaij.c:116)
>> ==27781== by 0x96F0F0: MatAssemblyEnd_MPIAIJ(_p_Mat*,
>> MatAssemblyType) (mpiaij.c:706)
>> ==27781== by 0xA45358: MatAssemblyEnd (matrix.c:4959)
>> ==27781== by 0x464301: main (ex7.c:78)
>>
>> This was produced by src/ksp/ksp/tutorials/ex7.c. The command
>> line options are
>>
>> ./ex7 -mat_type mpiaijcusparse -vec_type cusp
>>
>> In this particular case the todata is of type
>> VecScatter_Seq_Stride and fromdata is of type
>> VecScatter_Seq_General. The complete valgrind log (including
>> configure options for petsc) is attached.
>>
>> Any comments or suggestions are appreciated.
>> Cheers,
>> Dominic
>>
>>
>> -Paul
>>
>>
>> On Wed, Jan 22, 2014 at 9:49 AM, Dominic Meiser
>> <dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>
>> <mailto:dmeiser at txcorp.com <mailto:dmeiser at txcorp.com>>>
>> wrote:
>>
>> Hi,
>>
>> I'm trying to understand VecScatterInitializeForGPU in
>> src/vec/vec/utils/veccusp/__vscatcusp.c. I don't
>> understand why
>>
>> this function can get away with casting the fromdata and
>> todata in
>> the inctx to VecScatter_MPI_General. Don't we need to
>> inspect the
>> VecScatterType fields of the todata and fromdata?
>>
>> Cheers,
>> Dominic
>>
>> --
>> Dominic Meiser
>> Tech-X Corporation
>> 5621 Arapahoe Avenue
>> Boulder, CO 80303
>> USA
>> Telephone: 303-996-2036 <tel:303-996-2036>
>> <tel:303-996-2036 <tel:303-996-2036>>
>> Fax: 303-448-7756 <tel:303-448-7756> <tel:303-448-7756
>> <tel:303-448-7756>>
>> www.txcorp.com <http://www.txcorp.com>
>> <http://www.txcorp.com>
>>
>>
>>
>>
>>
>> --
>> Dominic Meiser
>> Tech-X Corporation
>> 5621 Arapahoe Avenue
>> Boulder, CO 80303
>> USA
>> Telephone: 303-996-2036 <tel:303-996-2036>
>> Fax: 303-448-7756 <tel:303-448-7756>
>> www.txcorp.com <http://www.txcorp.com>
>>
>>
>
>
> --
> Dominic Meiser
> Tech-X Corporation
> 5621 Arapahoe Avenue
> Boulder, CO 80303
> USA
> Telephone:303-996-2036 <tel:303-996-2036>
> Fax:303-448-7756 <tel:303-448-7756>
> www.txcorp.com <http://www.txcorp.com>
>
>
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com
More information about the petsc-dev
mailing list