efficiency of tranposing a Matrix?

Wed Feb 21 15:03:12 CST 2007

Shi,

Checking MatTranspose_MPIAIJ(), I find that
the preallocation is not implemented.
This is likely the reason of slowdown.
>
> I have a code that keeps on using the same matrix L
> and its transpose in all time updates.
> I can improve the performance of the code by replacing
> the MatMultTranspose() with MatMult() and computing
> the transposed matrix at the beginning of the code for
> only once. The cost is of course extra storage of the
> transposed matrix.
>
> However, I have a question regarding the efficiency of
> transposing the matrix. I created the Matrix L with
> MPIAIJ and preallocated the proper memory for it.
> Then I call MatTranspose(L,&LT) to compute LT which is
> the transposed L. But I noticed that this process is
> extremely slow, 6 times slower than the creation of
> Matrix L itself.
>
> The first question is do I need to preallocate the
> memory  for LT also? I didn't do it since I suppose
> PETSc is smart enough to figure out the necessary
> storage.

Preallocation of LT is non-trivial, requring
all-to-all communications. I'll add it into MatTranspose_MPIAIJ().

> Secondly, I am not sure why MatTranspose is so slow. I
> understand in order to transpose a Matrix, one may
> need to call MPI_Alltoall which is extremely
> expensive. But it seems trivial that I can go through
> a similar process of creating the Matrix L and be much
> faster. I am not sure how MatTraspose() is implemented
> and whether I should actually compose LT instead of
> transpose L.

If you know the non-zero structure of LT without communication,
creating it directly would outperform petsc MatTranspose().
See MatMatTranspose_MPIAIJ() in
petsc/src/mat/impls/aij/mpi/mpiaij.c for details.

Hong