[petsc-dev] MatTransposeMatMult and MatTranspose

Hong Zhang hzhang at mcs.anl.gov
Sun Oct 14 19:25:28 CDT 2012


Jed:
Impressive!
Do you ever sleep?
Hong

I added proper preallocation for MatTranspose_MPIAIJ(), which speeds it up
> greatly.
>
>
> https://bitbucket.org/petsc/petsc-dev/changeset/486d000050ec62fbd732c0049cb5f09b2b5709b8
>
> https://bitbucket.org/petsc/petsc-dev/changeset/75fca7ed1efa754ca010596a8ba69319501baf52(oops)
>
> Testing on cg
> $ mpirun -n 64 ./ex56 -pc_type gamg -ksp_monitor -ksp_rtol 1e-1
> -log_summary -mattransposematmult_viamatmatmult 1
>
> *Before:*
> -ne 99
> MatTranspose           3 1.0 *1.3230e+00* 1.0 0.00e+00 0.0 1.0e+04
> 2.7e+03 5.1e+01 17  0  3  2  4  33  0  6  7  4     0
> MatTrnMatMult          3 1.0 1.8360e+00 1.0 2.26e+07 1.1 2.3e+04 6.0e+03
> 1.2e+02 24  2  6 12  9  46 10 13 35 10   765
> -ne 119
> MatTranspose           3 1.0 *2.3402e+00* 1.0 0.00e+00 0.0 1.3e+04
> 3.1e+03 5.1e+01 16  0  3  2  4  34  0  6  7  4     0
> MatTrnMatMult          3 1.0 3.2240e+00 1.0 3.91e+07 1.1 2.8e+04 6.9e+03
> 1.2e+02 23  2  6 12  9  46 10 13 35 10   759
>
> *After:*
> -ne 99
> MatTranspose           3 1.0 *9.5813e-02* 1.0 0.00e+00 0.0 1.0e+04
> 2.7e+03 4.8e+01  1  0  3  2  4   3  0  6  7  4     0
> MatTrnMatMult          3 1.0 6.0673e-01 1.0 2.26e+07 1.1 2.3e+04 6.0e+03
> 1.2e+02  8  2  6 12  9  21 10 13 35 10  2316
> -ne 119
> MatTranspose           3 1.0 *1.8572e-01* 1.0 0.00e+00 0.0 1.3e+04
> 3.1e+03 4.8e+01  2  0  3  2  4   4  0  6  7  4     0
> MatTrnMatMult          3 1.0 1.0656e+00 1.0 3.91e+07 1.1 2.8e+04 6.9e+03
> 1.2e+02 10  2  6 12  9  23 10 13 35 10  2297
>
> *Reference* (-mattransposematmult_viamatmatmult 0):
> -ne 99
> MatTrnMatMult          3 1.0 8.0196e-01 1.0 1.02e+08 1.1 1.3e+04 1.3e+04
> 8.7e+01 13 10  4 15  7  28 33  8 40  7  7831
> -ne 119
> MatTrnMatMult          3 1.0 1.3759e+00 1.0 1.78e+08 1.1 1.6e+04 1.6e+04
> 8.7e+01 12 10  4 15  7  27 33  8 40  8  7999
>
> I don't know why the reference implementation claims to have done so many
> more flops.
>
> This indicates that perhaps it makes sense for MatPtAP to do an explicit
> transpose and then RAP. Unless we can find a fast data structure for A^T *
> B.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121014/2972a219/attachment.html>


More information about the petsc-dev mailing list