<div dir="ltr"><div dir="ltr">On Thu, Feb 12, 2026 at 6:14 PM feng wang <<a href="mailto:snailsoar@hotmail.com">snailsoar@hotmail.com</a>> wrote:</div><div class="gmail_quote gmail_quote_container"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg8542811183821037509">
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Mat,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks for your reply.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
For "VecCreateGhostBlock", The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick
to one GPU for the moment. or is there a way around this?</div></div></div></blockquote><div><br></div><div>There is a way around it. We have an open Issue. Someone needs to allow the vectors to be created with another type. It is not hard, it just takes time. I can do it starting the middle of March if you need it quickly.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg8542811183821037509"><div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can
set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this?</div></div></div></blockquote><div><br></div><div>1) You can do that using</div><div><br></div><div> MatCreate()</div><div> MatSetSizes()</div><div> MatSetBlockSize()</div><div> MatSetType()</div><div><br></div><div>but, I still don't think you should do that.</div><div><br></div><div>2) You can provide PETSc options from any source you want using PetscOptionsSetValue() and PetscOptionsInsertString(), so you can manage them however you want.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg8542811183821037509"><div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values. For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar,
do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO?</div></div></div></blockquote><div><br></div><div>Yes. COO is much more efficient on the GPU than calling SetValues() individually. GPUs have horrible latency and hate branching. This is about the only way to make them competitive with CPUs for building operators.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg8542811183821037509"><div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks for your help in advance.</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Best regards,</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Feng</div>
<div style="font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_8542811183821037509divRplyFwdMsg">
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>From:</b> Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>><br>
<b>Sent:</b> 11 February 2026 16:32<br>
<b>To:</b> feng wang <<a href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>><br>
<b>Cc:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU</div>
<div style="direction:ltr"> </div>
</div>
<div style="direction:ltr">On Wed, Feb 11, 2026 at 10:58 AM feng wang <<a id="m_8542811183821037509OWAbb2b2197-b7be-a72b-704a-fd90289027d7" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>> wrote:</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Mat,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks for your reply. Maybe I am overthinking it.</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
ksp/ex15 works fine with GPUs. </div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device.</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Below is a snippet of my current petsc implementation. Suppose I have:</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);</div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now?</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
//duplicate </div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
//create preconditioning matrix</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr);</div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<u>If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device?</u></div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Below is how I assign values for the matrix:</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
nnz=0;</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
for(jv=0; jv<nv; jv++)</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
{</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
for(iv=0; iv<nv; iv++)</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
{</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the left hand side is [I/dt + (-J)]</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
nnz++;</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
}</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
}</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
idxm[0] = ig_mat[iql];</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
idxn[0] = ig_mat[iqr];</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, ADD_VALUES);</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
CHKERRQ(ierr);</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
}<br>
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<u>Does petsc first set the value in the host and copy it to the device or the value is directly assigned in the device. in the 2nd case, I would need change my code a bit, since I need to make sure the data is in the device in the first place.</u></div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">Yes, you would need to set the values on device for maximum efficiency (although I would try it out with CPU construction first). You can do this best on the GPU using MatSetValuesCOO().</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"> Thanks,</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"> Matt</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Feng</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="direction:ltr;display:inline-block;width:98%">
<div id="m_8542811183821037509x_m_453449198775360914divRplyFwdMsg">
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>From:</b> Matthew Knepley <<a id="m_8542811183821037509OWAaa9b2302-4be7-5323-0366-257670405961" href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>><br>
<b>Sent:</b> 11 February 2026 13:42<br>
<b>To:</b> feng wang <<a id="m_8542811183821037509OWA9c18afbc-c9ea-6add-0b3e-668b3fbc3a7d" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>><br>
<b>Cc:</b> Junchao Zhang <<a id="m_8542811183821037509OWA90467f55-cff8-c884-f5a1-093653d3928c" href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>>;
<a id="m_8542811183821037509OWA9198fcb6-1265-0b82-5954-2402b6be991f" href="mailto:petsc-users@mcs.anl.gov" target="_blank">
petsc-users@mcs.anl.gov</a> <<a id="m_8542811183821037509OWA18e6a2c6-7658-cb20-91c7-b6a5d30b296a" href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU</div>
<div style="direction:ltr"> </div>
</div>
<div style="direction:ltr">On Wed, Feb 11, 2026 at 5:55 AM feng wang <<a id="m_8542811183821037509OWA189ebdd8-b003-3413-9b87-c92245b25da1" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>> wrote:</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks for your reply. Probably I did not phrase it in a clear way.</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code. For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal? </div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work?</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"> Thanks,</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"> Matt</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Feng</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="direction:ltr;display:inline-block;width:98%">
<div id="m_8542811183821037509x_m_453449198775360914x_m_-7196078059610615373divRplyFwdMsg">
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>From:</b> Junchao Zhang <<a id="m_8542811183821037509OWAab877c5d-f3d4-787c-1f63-12f745b9fcef" href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> 11 February 2026 3:00<br>
<b>To:</b> feng wang <<a id="m_8542811183821037509OWAe5e9c695-2aea-953f-b9e0-1b64f564423b" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>><br>
<b>Cc:</b> <a id="m_8542811183821037509OWA03a2f8aa-bfc6-c03b-bcbc-b865b15d6c60" href="mailto:petsc-users@mcs.anl.gov" target="_blank">
petsc-users@mcs.anl.gov</a> <<a id="m_8542811183821037509OWA341564fd-eb18-b0af-7b36-288e0292af10" href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Barry Smith <<a id="m_8542811183821037509OWAa66924b6-004d-9fc7-c404-54dae9ec8ac5" href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>><br>
<b>Subject:</b> Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU</div>
<div style="direction:ltr"> </div>
</div>
<div style="direction:ltr">Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0)<span style="font-family:Calibri,Helvetica,sans-serif;font-size:16px;color:rgb(0,0,0)"> on GPUs?<b> </b></span> I Cc'ed Barry, who
knows better about the algorithms.</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">--Junchao Zhang</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">On Tue, Feb 10, 2026 at 3:57 PM feng wang <<a id="m_8542811183821037509OWA20a409d7-ec4f-c09e-57a4-ad3a1cbd1e89" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>> wrote:</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.
<b>My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you
please give some advice?</b></div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Feng</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="direction:ltr;display:inline-block;width:98%">
<div id="m_8542811183821037509x_m_453449198775360914x_m_-7196078059610615373x_m_-6901202929570229040divRplyFwdMsg">
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>From:</b> Junchao Zhang <<a id="m_8542811183821037509OWAa695b88b-5d66-725b-ea3e-050fb3bd479e" href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> 09 February 2026 23:18<br>
<b>To:</b> feng wang <<a id="m_8542811183821037509OWA69b1147a-8537-4acf-3228-b43e4e28d1e0" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>><br>
<b>Cc:</b> <a id="m_8542811183821037509OWA4c11c992-29da-188a-af62-e6db1d90c6fc" href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a id="m_8542811183821037509OWA601a8b21-6b99-6dcf-b509-558682b9b2fa" href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU</div>
<div style="direction:ltr"> </div>
</div>
<div style="direction:ltr">Hi Feng,<br>
At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. <br>
</div>
<div style="direction:ltr"> Thanks!</div>
<div style="direction:ltr">--Junchao Zhang</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">On Mon, Feb 9, 2026 at 4:31 PM feng wang <<a id="m_8542811183821037509OWA863a212d-e02f-1d29-e50a-2812fc723feb" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>> wrote:</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Junchao,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Many thanks for your reply.</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Feng</div>
<hr style="direction:ltr;display:inline-block;width:98%">
<div id="m_8542811183821037509x_m_453449198775360914x_m_-7196078059610615373x_m_-6901202929570229040x_m_-6845830625879680973divRplyFwdMsg">
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>From:</b> Junchao Zhang <<a id="m_8542811183821037509OWAae999ec8-a3b6-8e80-00da-f31ef6b4d1f4" href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> 09 February 2026 1:55<br>
<b>To:</b> feng wang <<a id="m_8542811183821037509OWAce199207-c26a-cc03-6ccd-b44d40a8ccc3" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>><br>
<b>Cc:</b> <a id="m_8542811183821037509OWA835ba051-2103-02e6-1d0d-e90fe0b1b4b5" href="mailto:petsc-users@mcs.anl.gov" target="_blank">
petsc-users@mcs.anl.gov</a> <<a id="m_8542811183821037509OWAd42c9fdf-5d09-4cc2-78f0-469fa465f73c" href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU</div>
<div style="direction:ltr"> </div>
</div>
<div style="direction:ltr">Hello Feng,<br>
It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type
{cuda or kokkos}.<br>
But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. <br>
<br>
Thanks!<br>
--Junchao Zhang</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">On Sun, Feb 8, 2026 at 5:46 PM feng wang <<a id="m_8542811183821037509OWA2ac9b78a-dd50-1edf-5711-af60a669f197" href="mailto:snailsoar@hotmail.com" target="_blank">snailsoar@hotmail.com</a>> wrote:</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Dear All,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Many thanks for your help in advance,</div>
<div style="direction:ltr;font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Feng</div>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">--</div>
<div style="direction:ltr">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><a id="m_8542811183821037509OWA550085ca-e4e5-8bc0-97b8-69aa35e9fb0c" href="https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUiveMOA1n$" target="_blank">https://www.cse.buffalo.edu/~knepley/</a></div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">--</div>
<div style="direction:ltr">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><a id="m_8542811183821037509OWA53077aff-a2fc-6aee-891c-8b509872ed78" href="https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUiveMOA1n$" target="_blank">https://www.cse.buffalo.edu/~knepley/</a></div>
</div>
</div></blockquote></div><div><br clear="all"></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUiveMOA1n$" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>