I added proper preallocation for MatTranspose_MPIAIJ(), which speeds it up greatly.<div><br></div><div><a href="https://bitbucket.org/petsc/petsc-dev/changeset/486d000050ec62fbd732c0049cb5f09b2b5709b8">https://bitbucket.org/petsc/petsc-dev/changeset/486d000050ec62fbd732c0049cb5f09b2b5709b8</a></div>
<div><a href="https://bitbucket.org/petsc/petsc-dev/changeset/75fca7ed1efa754ca010596a8ba69319501baf52">https://bitbucket.org/petsc/petsc-dev/changeset/75fca7ed1efa754ca010596a8ba69319501baf52</a> (oops)</div><div><br></div>
<div>Testing on cg</div><div><div><font face="courier new, monospace">$ mpirun -n 64 ./ex56 -pc_type gamg -ksp_monitor -ksp_rtol 1e-1 -log_summary -mattransposematmult_viamatmatmult 1</font></div></div><div><br></div><div>
<b>Before:</b></div><div><span style="font-family:'courier new',monospace">-ne 99<br>MatTranspose 3 1.0 </span><b style="font-family:'courier new',monospace">1.3230e+00</b><span style="font-family:'courier new',monospace"> 1.0 0.00e+00 0.0 1.0e+04 2.7e+03 5.1e+01 17 0 3 2 4 33 0 6 7 4 0</span></div>
<div><div><font face="courier new, monospace">MatTrnMatMult 3 1.0 1.8360e+00 1.0 2.26e+07 1.1 2.3e+04 6.0e+03 1.2e+02 24 2 6 12 9 46 10 13 35 10 765</font></div></div><div><div><font face="courier new, monospace">-ne 119</font></div>
<div><font face="courier new, monospace">MatTranspose 3 1.0 <b>2.3402e+00</b> 1.0 0.00e+00 0.0 1.3e+04 3.1e+03 5.1e+01 16 0 3 2 4 34 0 6 7 4 0</font></div><div><font face="courier new, monospace">MatTrnMatMult 3 1.0 3.2240e+00 1.0 3.91e+07 1.1 2.8e+04 6.9e+03 1.2e+02 23 2 6 12 9 46 10 13 35 10 759</font></div>
</div><div><br></div><div><b>After:</b><br><div><font face="courier new, monospace">-ne 99</font></div><div><font face="courier new, monospace">MatTranspose 3 1.0 <b>9.5813e-02</b> 1.0 0.00e+00 0.0 1.0e+04 2.7e+03 4.8e+01 1 0 3 2 4 3 0 6 7 4 0</font></div>
<div><font face="courier new, monospace">MatTrnMatMult 3 1.0 6.0673e-01 1.0 2.26e+07 1.1 2.3e+04 6.0e+03 1.2e+02 8 2 6 12 9 21 10 13 35 10 2316</font></div></div><div>-ne 119</div><div><span style="font-family:'courier new',monospace">MatTranspose 3 1.0 <b>1.8572e-01</b> 1.0 0.00e+00 0.0 1.3e+04 3.1e+03 4.8e+01 2 0 3 2 4 4 0 6 7 4 0</span></div>
<div><div><font face="courier new, monospace">MatTrnMatMult 3 1.0 1.0656e+00 1.0 3.91e+07 1.1 2.8e+04 6.9e+03 1.2e+02 10 2 6 12 9 23 10 13 35 10 2297</font></div></div><div><br><b>Reference</b> (-mattransposematmult_viamatmatmult 0):</div>
<div><font face="courier new, monospace">-ne 99<br>MatTrnMatMult 3 1.0 8.0196e-01 1.0 1.02e+08 1.1 1.3e+04 1.3e+04 8.7e+01 13 10 4 15 7 28 33 8 40 7 7831</font></div><div><font face="courier new, monospace">-ne 119<br>
MatTrnMatMult 3 1.0 1.3759e+00 1.0 1.78e+08 1.1 1.6e+04 1.6e+04 8.7e+01 12 10 4 15 7 27 33 8 40 8 7999</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="arial, helvetica, sans-serif">I don't know why the reference implementation claims to have done so many more flops.</font></div>
<div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">This indicates that perhaps it makes sense for MatPtAP to do an explicit transpose and then RAP. Unless we can find a fast data structure for A^T * B.</font></div>