<div dir="ltr"><div>Hi, Chang,</div><div> For the mumps solver, we usually transfers matrix and vector data within a compute node. For the idea you propose, it looks like we need to gather data within MPI_COMM_WORLD, right?</div><div><br></div> Mark, I remember you said cusparse solve is slow and you would rather do it on CPU. Is it right? <div><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
Currently, it is possible to use mumps solver in PETSC with <br>
-mat_mumps_use_omp_threads option, so that multiple MPI processes will <br>
transfer the matrix and rhs data to the master rank, and then master <br>
rank will call mumps with OpenMP to solve the matrix.<br>
<br>
I wonder if someone can develop similar option for cusparse solver. <br>
Right now, this solver does not work with mpiaijcusparse. I think a <br>
possible workaround is to transfer all the matrix data to one MPI <br>
process, and then upload the data to GPU to solve. In this way, one can <br>
use cusparse solver for a MPI program.<br>
<br>
Chang<br>
-- <br>
Chang Liu<br>
Staff Research Physicist<br>
+1 609 243 3438<br>
<a href="mailto:cliu@pppl.gov" target="_blank">cliu@pppl.gov</a><br>
Princeton Plasma Physics Laboratory<br>
100 Stellarator Rd, Princeton NJ 08540, USA<br>
</blockquote></div>