Actually, please excuse my misreading. The quoted 2-3x speedup was versus itself due to hardware prefetch, so it does appear to be asymptotically better than SGEMM (they quote the crossover at n ~= 1000).<br><br>Jack<br><br>
<div class="gmail_quote">On Wed, Mar 28, 2012 at 10:23 PM, Jack Poulson <span dir="ltr"><<a href="mailto:jack.poulson@gmail.com">jack.poulson@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
I just took a look at the paper and, after reading a bit, I am baffled that there are no citations of the H-matrix literature (Ctrl+F "Hackbusch" turns up nothing...). The paper seems to be in a similar spirit as the well-known H-matrix/H-matrix multiplication algorithms, which take O(N log^2 N) work, for O(1) ranks, but using truncation instead of low-rank approximations in clusters. Since only factors of 2-3x are reported versus SGEMM, which is O(N^3), that tells you that this algorithm (or, at least, its implementation) is O(N^3). <br>
<br>I do not think a comparison can be drawn to sparse matrix-matrix multiply algorithms. I believe Gilbert and Buluc have the most recent publication on parallel sparse mat-mat. I have done some work on parallel H-matrix/H-matrix multiplication (the code is here: <a href="http://bitbucket.org/poulson/dmhm" target="_blank">http://bitbucket.org/poulson/dmhm</a>), but I haven't pushed the paper out the door yet. See <a href="http://www.ices.utexas.edu/%7Epoulson/slides/DMHM.pdf" target="_blank">http://www.ices.utexas.edu/~poulson/slides/DMHM.pdf</a> for the talk I gave on it. I suspect that the ideas from parallel H-matrix/H-matrix multiplication could be used to generate a communication-efficient sparse mat-mat if it is okay to generate the result in a more general data-sparse form (i.e., low rank blocks) rather than specifically as explicitly classically sparse.<span class="HOEnZb"><font color="#888888"><br>
<br>Jack</font></span><div class="HOEnZb"><div class="h5"><br><br><div class="gmail_quote">On Wed, Mar 28, 2012 at 10:07 PM, Hong Zhang <span dir="ltr"><<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Entire paper discusses SpAMM, and compares with SGEMM.<div>Sorry, I do not see anything about SpMV.</div><span><font color="#888888"><div>Hong</div></font></span><div><div><div><br>
<div class="gmail_quote">On Wed, Mar 28, 2012 at 9:52 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>On Wed, Mar 28, 2012 at 9:39 PM, Hong Zhang <span dir="ltr"><<a href="mailto:hzhang@mcs.anl.gov" target="_blank">hzhang@mcs.anl.gov</a>></span> wrote:<br>
</div><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Matt:<br><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">to the new scalable MatMatMult() that is done in PETSc? Is it possible to even compare<div>
the flop rates sensibly?</div><div><br></div><div> <a href="http://arxiv.org/abs/1203.1692" target="_blank">http://arxiv.org/abs/1203.1692</a></div></blockquote></div><div>"An Optimized Sparse Approximate Matrix Multiply"</div>
<div>computes Approximate matrix product for matrices with decay</div><div> ^^^^^^^^^^^^ ^^^^^^^^^^</div><div>It is for special application in chemistry(?), </div>
<div>not applicable for general sparse matrix product.</div><div>No comparison can be drawn to petsc's C=A*B.</div></div></blockquote><div><br></div></div><div>This is the wrong conclusion. What he does is sparsify the result, so he is</div>
<div>computing SOME sort of sparse matrix-vector product. The question is:</div><div><br></div><div> Does his SpMV use the same algorithm as PETSc?</div><div><br></div><div>That is why I asked.</div><div><br></div><div>
Matt</div><div><div> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><span><font color="#888888"><div>Hong</div>
</font></span><div>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div><br></div><div>This is what Jeff is planning on using in his SciDAC.</div><div><br></div><div> Matt<span><font color="#888888"><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</font></span></div>
</blockquote></div></div><br>
</blockquote></div></div><div><div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br>
</div></div></blockquote></div><br>