[petsc-users] [External] Re: MatVec on GPUs

Mon Oct 18 23:28:46 CDT 2021

On Mon, Oct 18, 2021 at 10:56 PM Swarnava Ghosh <swarnava89 at gmail.com>
wrote:

> I am trying the port parts of the following function on GPUs. Essentially,
> the lines of codes between the two "TODO..." comments should be executed on
> the device. Here is the function:
>
> PetscScalar CalculateSpectralNodesAndWeights(LSDFT_OBJ *pLsdft, int p, int
> LIp)
> {
>
>   PetscInt N_qp;
>   N_qp = pLsdft->N_qp;
>
>   int k;
>   PetscScalar *a, *b;
>   k=0;
>
>   PetscMalloc(sizeof(PetscScalar)*(N_qp+1), &a);
>   PetscMalloc(sizeof(PetscScalar)*(N_qp+1), &b);
>
>   /*
>    * TODO: COPY a, b, pLsdft->Vk, pLsdft->Vkm1, pLsdft->Vkp1,
> pLsdft->LapPlusVeffOprloc, k,p,N_qp from HOST to DEVICE
>    * DO THE FOLLOWING OPERATIONS ON DEVICE
>    */
>
>   //zero out vectors
>   VecZeroEntries(pLsdft->Vk);
>   VecZeroEntries(pLsdft->Vkm1);
>   VecZeroEntries(pLsdft->Vkp1);
>
>   VecSetValue(pLsdft->Vkm1, p, 1.0, INSERT_VALUES);
>   MatMult(pLsdft->LapPlusVeffOprloc,pLsdft->Vkm1,pLsdft->Vk);
>   VecDot(pLsdft->Vkm1, pLsdft->Vk, &a[0]);
>   VecAXPY(pLsdft->Vk, -a[0], pLsdft->Vkm1);
>   VecNorm(pLsdft->Vk, NORM_2, &b[0]);
>   VecScale(pLsdft->Vk, 1.0 / b[0]);
>
>   for (k = 0; k < N_qp; k++) {
>     MatMult(pLsdft->LapPlusVeffOprloc,pLsdft->Vk,pLsdft->Vkp1);
>     VecDot(pLsdft->Vk, pLsdft->Vkp1, &a[k + 1]);
>     VecAXPY(pLsdft->Vkp1, -a[k + 1], pLsdft->Vk);
>     VecAXPY(pLsdft->Vkp1, -b[k], pLsdft->Vkm1);
>     VecCopy(pLsdft->Vk, pLsdft->Vkm1);
>     VecNorm(pLsdft->Vkp1, NORM_2, &b[k + 1]);
>     VecCopy(pLsdft->Vkp1, pLsdft->Vk);
>     VecScale(pLsdft->Vk, 1.0 / b[k + 1]);
>   }
>
>   /*
>    * TODO: Copy back a, b, pLsdft->Vk, pLsdft->Vkm1, pLsdft->Vkp1,
> pLsdft->LapPlusVeffOprloc, k,p,N_qp from DEVICE to HOST
>    */
>
>   /*
>    * Some operation with a, and b on HOST
>    *
>    */
>   TridiagEigenVecSolve_NodesAndWeights(pLsdft, a, b, N_qp, LIp);  //
> operation on the host
>
>   // free a,b
>   PetscFree(a);
>   PetscFree(b);
>
>   return 0;
> }
>
> If I just use the command line options to set vectors Vk,Vkp1 and Vkm1 as
> cuda vectors and the matrix  LapPlusVeffOprloc as aijcusparse, will the
> lines of code between the two "TODO" comments be entirely executed on the
> device?
>
yes, except  VecSetValue(pLsdft->Vkm1, p, 1.0, INSERT_VALUES);  which is
done on CPU, by pulling down vector data from GPU to CPU and setting the
value.  Subsequent vector operations will push the updated vector data to
GPU again.

>
> Sincerely,
> Swarnava
>
>
> On Mon, Oct 18, 2021 at 10:13 PM Swarnava Ghosh <swarnava89 at gmail.com>
> wrote:
>
>> Thanks for the clarification, Junchao.
>>
>> Sincerely,
>> Swarnava
>>
>> On Mon, Oct 18, 2021 at 10:08 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>
>>>
>>>
>>> On Mon, Oct 18, 2021 at 8:47 PM Swarnava Ghosh <swarnava89 at gmail.com>
>>> wrote:
>>>
>>>> Hi Junchao,
>>>>
>>>> If I want to pass command line options as  -mymat_mat_type aijcusparse,
>>>> should it be MatSetOptionsPrefix(A,"mymat"); or
>>>> MatSetOptionsPrefix(A,"mymat_"); ? Could you please clarify?
>>>>
>>>  my fault, it should be MatSetOptionsPrefix(A,"mymat_"), as seen in
>>> mat/tests/ex62.c
>>>  Thanks
>>>
>>>
>>>>
>>>> Sincerely,
>>>> Swarnava
>>>>
>>>> On Mon, Oct 18, 2021 at 9:23 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>> MatSetOptionsPrefix(A,"mymat")
>>>>> VecSetOptionsPrefix(v,"myvec")
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Mon, Oct 18, 2021 at 8:04 PM Chang Liu <cliu at pppl.gov> wrote:
>>>>>
>>>>>> Hi Junchao,
>>>>>>
>>>>>> Thank you for your answer. I tried MatConvert and it works. I didn't
>>>>>> make it before because I forgot to convert a vector from mpi to
>>>>>> mpicuda
>>>>>> previously.
>>>>>>
>>>>>> For vector, there is no VecConvert to use, so I have to do
>>>>>> VecDuplicate,
>>>>>> VecSetType and VecCopy. Is there an easier option?
>>>>>>
>>>>>  As Matt suggested, you could single out the matrix and vector with
>>>>> options prefix and set their type on command line
>>>>>
>>>>> MatSetOptionsPrefix(A,"mymat");
>>>>> VecSetOptionsPrefix(v,"myvec");
>>>>>
>>>>> Then, -mymat_mat_type aijcusparse -myvec_vec_type cuda
>>>>>
>>>>> A simpler code is to have the vector type automatically set by
>>>>> MatCreateVecs(A,&v,NULL)
>>>>>
>>>>>
>>>>>> Chang
>>>>>>
>>>>>> On 10/18/21 5:23 PM, Junchao Zhang wrote:
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Oct 18, 2021 at 3:42 PM Chang Liu via petsc-users
>>>>>> > <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>>>> >
>>>>>> >     Hi Matt,
>>>>>> >
>>>>>> >     I have a related question. In my code I have many matrices and
>>>>>> I only
>>>>>> >     want to have one living on GPU, the others still staying on CPU
>>>>>> mem.
>>>>>> >
>>>>>> >     I wonder if there is an easier way to copy a mpiaij matrix to
>>>>>> >     mpiaijcusparse (in other words, copy data to GPUs). I can think
>>>>>> of
>>>>>> >     creating a new mpiaijcusparse matrix, and copying the data line
>>>>>> by
>>>>>> >     line.
>>>>>> >     But I wonder if there is a better option.
>>>>>> >
>>>>>> >     I have tried MatCopy and MatConvert but neither work.
>>>>>> >
>>>>>> > Did you use MatConvert(mat,matype,MAT_INPLACE_MATRIX,&mat)?
>>>>>> >
>>>>>> >
>>>>>> >     Chang
>>>>>> >
>>>>>> >     On 10/17/21 7:50 PM, Matthew Knepley wrote:
>>>>>> >      > On Sun, Oct 17, 2021 at 7:12 PM Swarnava Ghosh
>>>>>> >     <swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>
>>>>>> >      > <mailto:swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>>>
>>>>>> wrote:
>>>>>> >      >
>>>>>> >      >     Do I need convert the MATSEQBAIJ to a cuda matrix in
>>>>>> code?
>>>>>> >      >
>>>>>> >      >
>>>>>> >      > You would need a call to MatSetFromOptions() to take that
>>>>>> type
>>>>>> >     from the
>>>>>> >      > command line, and not have
>>>>>> >      > the type hard-coded in your application. It is generally a
>>>>>> bad
>>>>>> >     idea to
>>>>>> >      > hard code the implementation type.
>>>>>> >      >
>>>>>> >      >     If I do it from command line, then are the other MatVec
>>>>>> calls are
>>>>>> >      >     ported onto CUDA? I have many MatVec calls in my code,
>>>>>> but I
>>>>>> >      >     specifically want to port just one call.
>>>>>> >      >
>>>>>> >      >
>>>>>> >      > You can give that one matrix an options prefix to isolate it.
>>>>>> >      >
>>>>>> >      >    Thanks,
>>>>>> >      >
>>>>>> >      >       Matt
>>>>>> >      >
>>>>>> >      >     Sincerely,
>>>>>> >      >     Swarnava
>>>>>> >      >
>>>>>> >      >     On Sun, Oct 17, 2021 at 7:07 PM Junchao Zhang
>>>>>> >      >     <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com
>>>>>> >
>>>>>> >     <mailto:junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com
>>>>>> >>>
>>>>>> >     wrote:
>>>>>> >      >
>>>>>> >      >         You can do that with command line options -mat_type
>>>>>> >     aijcusparse
>>>>>> >      >         -vec_type cuda
>>>>>> >      >
>>>>>> >      >         On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh
>>>>>> >      >         <swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>
>>>>>> >     <mailto:swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>>>
>>>>>> wrote:
>>>>>> >      >
>>>>>> >      >             Dear Petsc team,
>>>>>> >      >
>>>>>> >      >             I had a query regarding using CUDA to accelerate
>>>>>> a matrix
>>>>>> >      >             vector product.
>>>>>> >      >             I have a sequential sparse matrix
>>>>>> (MATSEQBAIJ type).
>>>>>> >     I want
>>>>>> >      >             to port a MatVec call onto GPUs. Is there any
>>>>>> >     code/example I
>>>>>> >      >             can look at?
>>>>>> >      >
>>>>>> >      >             Sincerely,
>>>>>> >      >             SG
>>>>>> >      >
>>>>>> >      >
>>>>>> >      >
>>>>>> >      > --
>>>>>> >      > What most experimenters take for granted before they begin
>>>>>> their
>>>>>> >      > experiments is infinitely more interesting than any results
>>>>>> to which
>>>>>> >      > their experiments lead.
>>>>>> >      > -- Norbert Wiener
>>>>>> >      >
>>>>>> >      > https://www.cse.buffalo.edu/~knepley/
>>>>>> >     <https://www.cse.buffalo.edu/~knepley/>
>>>>>> >     <http://www.cse.buffalo.edu/~knepley/
>>>>>> >     <http://www.cse.buffalo.edu/~knepley/>>
>>>>>> >
>>>>>> >     --
>>>>>> >     Chang Liu
>>>>>> >     Staff Research Physicist
>>>>>> >     +1 609 243 3438
>>>>>> >     cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>> >     Princeton Plasma Physics Laboratory
>>>>>> >     100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Chang Liu
>>>>>> Staff Research Physicist
>>>>>> +1 609 243 3438
>>>>>> cliu at pppl.gov
>>>>>> Princeton Plasma Physics Laboratory
>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211018/9cebf1a9/attachment.html>