[petsc-dev] error with flags PETSc uses for determining AVX

Jed Brown jed at jedbrown.org
Sun Feb 14 11:50:01 CST 2021


Pierre Jolivet <pierre at joliv.et> writes:

>>>>   Expecting PETSc users to automatically add -march= is not realistic.  I will try to rig something up in configure where if the user does not provide march something reasonable is selected. 
>>> A softer (yet trivial to implement) option might also be to just alert the user that these flags exist in the usual message about using default optimization flags. Something like this would encourage users to do what Jed is doing:
>>> 
>>>       ***** WARNING: Using default optimization C flags -g -O3
>>> You might consider manually setting optimal optimization flags for your system with
>>> COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for examples. 
>>> In particular, you may want to supply specific flags (e.g. -march=native) 
>>> to take advantage of higher-performance instructions.
>> 
>> I think this is a reasonable thing to do.
>
> This is a reasonable message to print on the screen, but I don’t think this is a reasonable flag to impose by default.
> You are basically asking all package managers to add a new flag (-march=generic) which was previously not needed.
>
> I’m crossing my fingers Jed has a clever way of "making portable binaries that run-time detected when to use newer instructions where it matters”, because -march=native by default is just not practical when deploying software.

immintrin.h provides

if (_may_i_use_cpu_feature(_FEATURE_FMA | _FEATURE_AVX2) {
  fancy_version_that_needs_fma_and_avx2();
} else {
  fallback_version();
}

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_may_i_use&expand=3677,3677

I believe this function is slightly expensive because it probably calls the CPUID instruction each time. BLIS has code to cache the result and query features with simple bitwise math.
 
https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.h
https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.c

Of course this bit of dispatch should typically be done at object creation time, not every iteration.


More information about the petsc-dev mailing list