I just pushed some prefetch for (S)BAIJ kernels. The biggest win is for MatSolve_SeqSBAIJ_*_NaturalOrdering_inplace where this patch is showing 30 to 50% speedups on Core 2 and Opteron. The other kernels tend to improve by 20 to 30% on Opteron with less consistent improvements on Core 2 but usually near 20%.<div>
<br></div><div>I have not found any cases to be slowed down by this patch, provided the matrix does not fit in cache. If the matrix does fit in cache, then the non-temporal hint is bad and will cause matrix entries to be unnecessarily fetched from memory. I think the scenarios in which end-to-end performance is limited by matrix kernels where the entire matrix fits in cache are much more rare than those where the matrix does not fit in cache, thus I consider the non-temporal hint to be an unambiguous win. If anyone sees a negative performance impact, please report it.<div>
<br></div><div>Jed</div></div>