[petsc-dev] Understanding Vecscatter with Kokkos Vecs

Fri Feb 19 11:55:05 CST 2021

We ended up doing it just as you say - copy the data to the host, use it to build the IS, and build the scatter. Would be fun to optimize further, maybe, but as you say that might be premature since there's ongoing work. Happy to get to play with it a bit, though!

Aside: it's been on the list of good things to do, docs-wise, to be able to label parts of the API as more or less stable, so I'm hoping we'll get to that (though I think it makes sense to wait until we've finished some of the current migrations tasks).

> Am 19.02.2021 um 16:18 schrieb Jed Brown <jed at jedbrown.org>:
> 
> ISCUDA isn't even right (perhaps ISGENERALCUDA, ISBLOCKCUDA). I agree that this isn't a priority, but I could see it being needed in the next few years to avoid bottlenecks in adaptive mesh refinement or other adaptive algorithms. It's not a small amount of work, but I think all the index coordination can be done efficiently on a GPU.
> 
> Junchao Zhang <junchao.zhang at gmail.com> writes:
> 
>> Even ISCUDA is simple to add, the PetscSFSetUp algorithm and many functions
>> involved are done on host (and are not simple to be parallelized on GPU)
>> The indices passed to VecScatter are analyzed and re-grouped. Even they are
>> copied to device eventually, they are likely not in their original form.
>> So, copying the indices from device to host and build a VecScatter there
>> seems the easiest approach.
>> 
>> The Kokkos-related functions are experimental. We need to decide whether
>> they are good or not.
>> 
>> --Junchao Zhang
>> 
>> 
>> On Fri, Feb 19, 2021 at 4:32 AM Patrick Sanan <patrick.sanan at gmail.com>
>> wrote:
>> 
>>> Thanks! That helps a lot.
>>> 
>>> I assume "no," but is ISCUDA simple to add?
>>> 
>>> More on what I'm trying to do, in case I'm missing an obvious approach:
>>> 
>>> I'm working on a demo code that uses an external library, based on Kokkos,
>>> as a solver - I create a Vec of type KOKKOS and populate it with the
>>> solution data from the library, by getting access to the raw Kokkos view
>>> with VecKokkosGetDeviceView() * .
>>> 
>>> I then want to reorder that solution data into PETSc-native ordering (for
>>> a velocity-pressure DMStag), so I create a pair of ISs and a VecScatter to
>>> do that.
>>> 
>>> The issue is that to create this scatter, I need to use information
>>> (essentially, an element-to-index map) from the external library's
>>> mesh-management object, which lives on the device. This doesn't work (when
>>> host != device), because of course the ISs live on the host and to create
>>> them I need to provide host arrays of indices.
>>> 
>>> Am I stuck, for now, with sending the index information information from
>>> the device to the host, using it to create the IS, and then having
>>> essentially the same information go back to the device when I use the
>>> scatter?
>>> 
>>> * As an aside, it looks like some of these Kokkos-related functions and
>>> types are missing man pages - if you have time to add them, even as stubs,
>>> that'd be great (if not let me know and I'll just try to formally do it, so
>>> that at least the existence of the functions in the API is reflected on the
>>> website).
>>> 
>>> Am 18.02.2021 um 23:17 schrieb Junchao Zhang <junchao.zhang at gmail.com>:
>>> 
>>> 
>>> On Thu, Feb 18, 2021 at 4:04 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>> 
>>>> 
>>>> 
>>>> On Thu, Feb 18, 2021 at 1:55 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>> 
>>>>> VecScatter (i.e., SF, the two are the same thing) setup (building
>>>>> various index lists, rank lists) is done on the CPU.  is1, is2 must be host
>>>>> data.
>>>>> 
>>>> 
>>>> Just out of curiosity, is1 and is2 can not be created on a GPU device in
>>>> the first place? That being said, it is technically impossible? Or we just
>>>> did not implement them yet?
>>>> 
>>> Simply because we do not have an ISCUDA class.
>>> 
>>> 
>>>> 
>>>> Fande,
>>>> 
>>>> 
>>>>> When the SF is used to communicate device data, indices are copied to
>>>>> the device..
>>>>> 
>>>>> --Junchao Zhang
>>>>> 
>>>>> 
>>>>> On Thu, Feb 18, 2021 at 11:50 AM Patrick Sanan <patrick.sanan at gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I'm trying to understand how VecScatters work with GPU-native Kokkos
>>>>>> Vecs.
>>>>>> 
>>>>>> Specifically, I'm interested in what will happen in code like in
>>>>>> src/vec/vec/tests/ex22.c,
>>>>>> 
>>>>>> ierr = VecScatterCreate(x,is1,y,is2,&ctx);CHKERRQ(ierr);
>>>>>> 
>>>>>> (from
>>>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vec/tests/ex22.c#L44
>>>>>> )
>>>>>> 
>>>>>> Here, x and y can be set to type KOKKOS using -vec_type kokkos at the
>>>>>> command line. But is1 and is2 are (I think), always
>>>>>> CPU/host data. Assuming that the scatter itself can happen on the GPU,
>>>>>> the indices must make it to the device somehow - are they copied there when
>>>>>> the scatter is created? Is there a way to create the scatter using indices
>>>>>> already on the GPU (Maybe using SF more directly)?
>>>>>> 
>>>>>> 
>>>