<div dir="ltr">FWIW, the real benefits of TACO come from generating code that contracts higher order sparse tensors, which are difficult to code by hand and unlikely to be a kernel in a hand-tuned library. The "novel compiler techniques" mentioned on the website enable the compiler to reason about co-iteration through multiple sparse tensors at a time.<div><br></div><div>Rohan</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 12, 2021 at 9:50 AM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div><br></div>   It may be different with the optimization turned on. I am surprised that it is 40 usually it is lower.<br></div><br></div></blockquote><div><br></div><div>Sure, he was underperforming by 4x so he was using ~10 cores of a P9. Way below saturation (20-30).</div><div>(but he only got 20x speedup, not sure about that, load balance?, the matrix looks huge so probably not communication, but maybe)</div></div></div>

</blockquote></div>