[petsc-users] Using CUDA Graphs with MatComputeOperator

Sreeram R Venkat srvenkat at utexas.edu
Wed May 15 19:03:41 CDT 2024


I don't have a working example that I can easily send you since the
MatShell I have involves a lot of other code.

Essentially, I am trying to do something like this:

bool graphCreated = false;
cudaGraph_t graph;
cudaGraphExec_t instance;
if (!graphCreated)
    {
        gpuErrchk(cudaStreamBeginCapture(s, cudaStreamCaptureModeGlobal));
        PetscCall(MatComputeOperator(ShellMat, MATDENSECUDA, &DenseMat));
        gpuErrchk(cudaStreamEndCapture(s, &graph));
        gpuErrchk(cudaGraphInstantiate(&instance, graph, NULL, NULL, 0));
        graphCreated = true;
    }
gpuErrchk(cudaGraphLaunch(instance, s));
gpuErrchk(cudaStreamSynchronize(s));
gpuErrchk(cudaGraphExecDestroy(instance));
gpuErrchk(cudaGraphDestroy(graph));

The ShellMat context has the CUDA Stream "s", and that stream is used in
the shell computations. If I replace the MatComputeOperator() call with

for (int i = 0; i < N; i++)
    MatMult(ShellMat, x, y);

the code works inside the CUDA graph. I tried modifying the
MatConvert_Shell() code and wrapping the main for loop that computes each
column with the graph, but this still doesn't work.

I think if you have a MatShell that does some trivial computation (like
return y = x), that should be a good enough example to start with.

For now, I ended up making my own version of MatComputeOperator that
doesn't involve Mats or Vecs in the main loop (working directly with
arrays) so I could have control over the stream operations.

Thanks,
Sreeram

On Tue, May 14, 2024 at 10:15 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
> I haven't used CUDA graph with PETSc.  Do you happen to have a working
> example so we can debug?
>
> --Junchao Zhang
>
>
> On Tue, May 14, 2024 at 6:08 PM Sreeram R Venkat <srvenkat at utexas.edu>
> wrote:
>
>> I have a MatShell object that I want to convert to a MATDENSECUDA.
>> Normally, I use MatComputeOperator for this. However, I would now also like
>> to use a CUDA Graph so that all the calls to MatMult are captured. I can
>> wrap a code like for (int
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>> I have a MatShell object that I want to convert to a MATDENSECUDA.
>> Normally, I use MatComputeOperator for this. However, I would now also like
>> to use a CUDA Graph so that all the calls to MatMult are captured. I can
>> wrap a code like
>>
>> for (int i = 0; i < N; i++)
>>     MatMult(A, x,y);
>>
>> in a CUDA Graph, and it runs fine. If I try to wrap MatComputeOperator in
>> a graph, I get runtime errors like
>>  cuda error 906 (cudaErrorStreamCaptureImplicit) : operation would make
>> the legacy stream depend on a capturing blocking stream
>>
>> I tried modifying the MatConvert_Shell routine to only put the graph
>> around the main for loop, but that still gives the same errors. Is there a
>> way to use CUDA Graphs here (either through a modified MatConvert_Shell or
>> otherwise)?
>>
>> Thanks,
>> Sreeram
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240515/37f6cbdc/attachment.html>


More information about the petsc-users mailing list