<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 12, 2021 at 11:19 AM Chang Liu <<a href="mailto:cliu@pppl.gov">cliu@pppl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Junchao,<br>

<br>

No I only needs it to be transferred within a node. I use block-Jacobi <br>

method and GMRES to solve the sparse matrix, so each direct solver will <br>

take care of a sub-block of the whole matrix. In this way, I can use one <br>

GPU to solve one sub-block, which is stored within one node.<br>

<br>

It was stated in the documentation that cusparse solver is slow. <br>

However, in my test using ex72.c, the cusparse solver is faster than <br>

mumps or superlu_dist on CPUs.<br></blockquote><div><br></div><div>Are we talking about the factorization, the solve, or both?</div><div><br></div><div>We do not have an interface to cuSparse's LU factorization (I just learned that it exists a few weeks ago).</div><div>Perhaps your fast "cusparse solver" is '-pc_type lu -mat_type aijcusparse' ? This would be the CPU factorization, which is the dominant cost.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Chang<br>

<br>

On 10/12/21 10:24 AM, Junchao Zhang wrote:<br>

> Hi, Chang,<br>

>     For the mumps solver, we usually transfers matrix and vector data <br>

> within a compute node.  For the idea you propose, it looks like we need <br>

> to gather data within MPI_COMM_WORLD, right?<br>

> <br>

>     Mark, I remember you said cusparse solve is slow and you would <br>

> rather do it on CPU. Is it right?<br>

> <br>

> --Junchao Zhang<br>

> <br>

> <br>

> On Mon, Oct 11, 2021 at 10:25 PM Chang Liu via petsc-users <br>

> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <mailto:<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>> wrote:<br>

> <br>

>     Hi,<br>

> <br>

>     Currently, it is possible to use mumps solver in PETSC with<br>

>     -mat_mumps_use_omp_threads option, so that multiple MPI processes will<br>

>     transfer the matrix and rhs data to the master rank, and then master<br>

>     rank will call mumps with OpenMP to solve the matrix.<br>

> <br>

>     I wonder if someone can develop similar option for cusparse solver.<br>

>     Right now, this solver does not work with mpiaijcusparse. I think a<br>

>     possible workaround is to transfer all the matrix data to one MPI<br>

>     process, and then upload the data to GPU to solve. In this way, one can<br>

>     use cusparse solver for a MPI program.<br>

> <br>

>     Chang<br>

>     -- <br>

>     Chang Liu<br>

>     Staff Research Physicist<br>

>     +1 609 243 3438<br>

>     <a href="mailto:cliu@pppl.gov" target="_blank">cliu@pppl.gov</a> <mailto:<a href="mailto:cliu@pppl.gov" target="_blank">cliu@pppl.gov</a>><br>

>     Princeton Plasma Physics Laboratory<br>

>     100 Stellarator Rd, Princeton NJ 08540, USA<br>

> <br>

<br>

-- <br>

Chang Liu<br>

Staff Research Physicist<br>

+1 609 243 3438<br>

<a href="mailto:cliu@pppl.gov" target="_blank">cliu@pppl.gov</a><br>

Princeton Plasma Physics Laboratory<br>

100 Stellarator Rd, Princeton NJ 08540, USA<br>

</blockquote></div></div>