[petsc-dev] changes to mat_mult.c to enable fully parallel tuning checked in to the sparse_mutant repository (PETSc developers cc'd)

Sun Oct 10 05:56:50 CDT 2010

Hi John,

I had to go digging into the PETSc source a little to figure out what
was going wrong.  Here's a brief summary:

Serial matrices in PETSc are represented as a single matrix, with data
stored in mat->data

A parallel matrix A in PETSc is represented as two matrices, a matrix
a comprised of locally owned rows and columns of A corresponding to
the owned entries of the input vector x, and a matrix b comprised of
locally owned rows and the remaining columns of B corresponding to
non-local entries of x.

A = a+b

y = A*x
   = a*x_l + b*x_u

x_l = local values of x
x_u = unowned values of x

This is enabled by a VecScatter that occurs concurrent to the a*x_l
operation, which sends the required unowned values of x into a
(potentially large!) buffer A->lvec.

If you check out the function MatMultCDCR_MPIAIJ (MatMult_MPIAIJ in
src/mat/impls/aij/mpi/mpiaij.c if you're following in a standard PETSc
distro) you'll get a view of how this works.  I'm using dynamic PETSc
queries for now to pick these up, but we could also just hardcode the
calls to our code here.  Notice that we need a sequential MatMultAdd
routine, even if we're just doing parallel MatMults.  I have patched
in the standard PETSc sequential aij MatMultAdd routine for now, but
since this is potentially a large portion of the parallel matmult, we
will need to tune this.

Barry sat down and explained this to me on my visit to Argonne, so I'm
an idiot for not getting this right the first time :)

Cheers,
Aron