[petsc-dev] More prefetch

Tue Nov 4 15:40:32 CST 2014

> On Nov 3, 2014, at 9:17 AM, Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> wrote:
> 
> Hi,
> 
> it all looks like we have a bug with prefetch with PGI 14.7 + cpu specific compilation options.  We have a "signal 4 : illegal instruction (not reset when caught) " within MatMult_SeqAIJ_Inode.  The only thing we see there than can bug is the prefetch thing...
> 
> We were asking ourself two things:
> 
> #1- Since we don't kow how to handle these kinds of functions (_mm_*), how ca we test them?  Is the "example" in Configure.py:606 sufficient to validate a good/bad result?

  Unfortunately the test only checks if the link works, it doesn’t test if the prefetch actually does anything or might generate an error.  Writing a proper (runtime) test for this, especially in a batch environment seems very difficult.

> 
> #2- How can we disable PETSC_Prefetch when doing the petsc configuration?

  I have added the option —with-prefetch=0   (or —disable-prefetch etc) it is contained in the attached patch and will be in the next release as well as being in the PETSc repository (maint, master, and next).  Note those are always two dashes, do not cut and paste the — from above.

  Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: turn-off-prefetch.patch
Type: application/octet-stream
Size: 1583 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20141104/8434796e/attachment.obj>
-------------- next part --------------

> 
> Thanks,
> 
> Eric
> 
> On 12/03/2010 02:35 PM, Jed Brown wrote:
>> I just pushed some prefetch for (S)BAIJ kernels.  The biggest win is for
>> MatSolve_SeqSBAIJ_*_NaturalOrdering_inplace where this patch is showing
>> 30 to 50% speedups on Core 2 and Opteron.  The other kernels tend to
>> improve by 20 to 30% on Opteron with less consistent improvements on
>> Core 2 but usually near 20%.
>> 
>> I have not found any cases to be slowed down by this patch, provided the
>> matrix does not fit in cache.  If the matrix does fit in cache, then the
>> non-temporal hint is bad and will cause matrix entries to be
>> unnecessarily fetched from memory.  I think the scenarios in which
>> end-to-end performance is limited by matrix kernels where the entire
>> matrix fits in cache are much more rare than those where the matrix does
>> not fit in cache, thus I consider the non-temporal hint to be an
>> unambiguous win.  If anyone sees a negative performance impact, please
>> report it.
>> 
>> Jed
>