[petsc-dev] Understanding Vecscatter with Kokkos Vecs

Fri Feb 19 09:18:31 CST 2021

ISCUDA isn't even right (perhaps ISGENERALCUDA, ISBLOCKCUDA). I agree that this isn't a priority, but I could see it being needed in the next few years to avoid bottlenecks in adaptive mesh refinement or other adaptive algorithms. It's not a small amount of work, but I think all the index coordination can be done efficiently on a GPU.

Junchao Zhang <junchao.zhang at gmail.com> writes:

> Even ISCUDA is simple to add, the PetscSFSetUp algorithm and many functions
> involved are done on host (and are not simple to be parallelized on GPU)
> The indices passed to VecScatter are analyzed and re-grouped. Even they are
> copied to device eventually, they are likely not in their original form.
> So, copying the indices from device to host and build a VecScatter there
> seems the easiest approach.
>
> The Kokkos-related functions are experimental. We need to decide whether
> they are good or not.
>
> --Junchao Zhang
>
>
> On Fri, Feb 19, 2021 at 4:32 AM Patrick Sanan <patrick.sanan at gmail.com>
> wrote:
>
>> Thanks! That helps a lot.
>>
>> I assume "no," but is ISCUDA simple to add?
>>
>> More on what I'm trying to do, in case I'm missing an obvious approach:
>>
>> I'm working on a demo code that uses an external library, based on Kokkos,
>> as a solver - I create a Vec of type KOKKOS and populate it with the
>> solution data from the library, by getting access to the raw Kokkos view
>> with VecKokkosGetDeviceView() * .
>>
>> I then want to reorder that solution data into PETSc-native ordering (for
>> a velocity-pressure DMStag), so I create a pair of ISs and a VecScatter to
>> do that.
>>
>> The issue is that to create this scatter, I need to use information
>> (essentially, an element-to-index map) from the external library's
>> mesh-management object, which lives on the device. This doesn't work (when
>> host != device), because of course the ISs live on the host and to create
>> them I need to provide host arrays of indices.
>>
>> Am I stuck, for now, with sending the index information information from
>> the device to the host, using it to create the IS, and then having
>> essentially the same information go back to the device when I use the
>> scatter?
>>
>> * As an aside, it looks like some of these Kokkos-related functions and
>> types are missing man pages - if you have time to add them, even as stubs,
>> that'd be great (if not let me know and I'll just try to formally do it, so
>> that at least the existence of the functions in the API is reflected on the
>> website).
>>
>> Am 18.02.2021 um 23:17 schrieb Junchao Zhang <junchao.zhang at gmail.com>:
>>
>>
>> On Thu, Feb 18, 2021 at 4:04 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Feb 18, 2021 at 1:55 PM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>> VecScatter (i.e., SF, the two are the same thing) setup (building
>>>> various index lists, rank lists) is done on the CPU.  is1, is2 must be host
>>>> data.
>>>>
>>>
>>> Just out of curiosity, is1 and is2 can not be created on a GPU device in
>>> the first place? That being said, it is technically impossible? Or we just
>>> did not implement them yet?
>>>
>> Simply because we do not have an ISCUDA class.
>>
>>
>>>
>>> Fande,
>>>
>>>
>>>> When the SF is used to communicate device data, indices are copied to
>>>> the device..
>>>>
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Thu, Feb 18, 2021 at 11:50 AM Patrick Sanan <patrick.sanan at gmail.com>
>>>> wrote:
>>>>
>>>>> I'm trying to understand how VecScatters work with GPU-native Kokkos
>>>>> Vecs.
>>>>>
>>>>> Specifically, I'm interested in what will happen in code like in
>>>>> src/vec/vec/tests/ex22.c,
>>>>>
>>>>> ierr = VecScatterCreate(x,is1,y,is2,&ctx);CHKERRQ(ierr);
>>>>>
>>>>> (from
>>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vec/tests/ex22.c#L44
>>>>> )
>>>>>
>>>>> Here, x and y can be set to type KOKKOS using -vec_type kokkos at the
>>>>> command line. But is1 and is2 are (I think), always
>>>>> CPU/host data. Assuming that the scatter itself can happen on the GPU,
>>>>> the indices must make it to the device somehow - are they copied there when
>>>>> the scatter is created? Is there a way to create the scatter using indices
>>>>> already on the GPU (Maybe using SF more directly)?
>>>>>
>>>>>
>>