<div dir="ltr"><span style="color:rgb(32,33,36);font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:0.2px;white-space:nowrap">Hi, Claudio,</span> <div><br></div><div>You might be aware that petsc internally splits a MATMPIAIJ matrix A in two MATSEQAIJ matrices Ad and Ao, for the <i>diagonal</i> portion and the <i>off-diagonal </i>portion respectively, see <a href="https://petsc.org/release/docs/manualpages/Mat/MatCreateAIJ.html#MatCreateAIJ">https://petsc.org/release/docs/manualpages/Mat/MatCreateAIJ.html#MatCreateAIJ</a></div><div><br></div><div>There are tricks in MatMult(). Ao is a 'reduced' matrix, i.e., using local column indices. Ao's column size (as returned by<span style="color:rgb(0,0,0)"> </span>MatGetLocalSize) is the number of nonzero columns in the off-diagonal portion. </div><div><br></div><div>petsc kind of implements MatMult(A,x, y) as y = Ad*x + Ao*lvec, where lvec is a sequential vector with entries gathered/communicated from other processes. petsc only communicates entries corresponding to nonzero columns. </div><div><br></div><div>For your experimental purpose, you can un-split the matrix with MatMPIAIJGetLocalMat(A,scall,&Aloc), and do y2 = Aloc * x2. Note y2 is a sequential vector, since Aloc and x2 are sequential.</div><div><br></div><div>You can alias y2 with y, by letting them share the data array. Basically, VecGetArray(y,&a) and VecCreateSeqWithArray(..,a,&y2). That is easy and you can search petsc doc to find info.</div><div> </div><div>For benchmarking, you need to consider the MatMPIAIJGetLocalMat() cost. It does memory copying to merge Ad and Ao into Aloc.</div><div><div><font color="#202124" face="Roboto, RobotoDraft, Helvetica, Arial, sans-serif"><span style="font-size:14px;letter-spacing:0.2px;white-space:nowrap"><b><br></b></span></font></div>Hope that helps.<div><font color="#202124" face="Roboto, RobotoDraft, Helvetica, Arial, sans-serif"><span style="font-size:14px;letter-spacing:0.2px;white-space:nowrap"><b><br clear="all"></b></span></font><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 8, 2021 at 5:29 AM Claudio Kozický <<a href="mailto:koziccla@fit.cvut.cz">koziccla@fit.cvut.cz</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>
<br>
I am using PETSc in a performance comparison that evaluates the<br>
performance of parallel sparse matrix-vector multiplication (SpMV). For<br>
this purpose I have implemented a simple SpMV operation using PETSc,<br>
which multiplies parallel matrix A (type MatAIJ) with parallel vector x1<br>
and stores the result in parallel vector y1. Thus I perform SpMV using<br>
PETSc as MatMult(A, x1, y1). This part works without any problems.<br>
<br>
I would also like to implement SpMV operation y2 = A * x2, where x2 is a<br>
sequential vector (i.e. created using VecCreateSeq) but where A and y2<br>
are still parallel. The resulting implementation would be something<br>
like:<br>
<br>
MatCreateAIJ(..., &A); // a parallel matrix<br>
VecCreateSeq(..., &x2); // a per-process _sequential_ vector<br>
VecCreateMPI(..., &y2); // a parallel vector<br>
MatMult(A, x2, y2);<br>
<br>
The motivation of storing all of x2 in each process to is remove the<br>
need of broadcasting any elements of x2 (this approach makes sense in<br>
the context of what I am benchmarking). However I cannot seem to get<br>
this approach to work in PETSc. For example when I try this approach<br>
with a 4-by-4 matrix, with two order-4 vectors and using two MPI<br>
processes, then PETSc prints:<br>
<br>
Nonconforming object sizes<br>
Mat mat,Vec x: local dim 2 4<br>
<br>
I have attached a minimal working example that demonstrates what I am<br>
attempting to perform. Could it be that PETSc does not support<br>
combining a parallel and sequential vector in a single MatMult call?<br>
<br>
I have found functions for scattering and gathering vectors in the<br>
documentation of PETSc, but these do not seem to be a good match for<br>
what I am trying to benchmark. My intention is for each process to keep<br>
an identical copy of vector x2 and therefore the necessity to scatter or<br>
gather values in x2 should never arise.<br>
<br>
I would appreciate if somebody could help point me in the right<br>
direction regarding my failing MatMult call. Thanks!<br>
<br>
-- <br>
Claudio Kozický<br>
</blockquote></div>