[petsc-dev] error with flags PETSc uses for determining AVX

Barry Smith bsmith at petsc.dev
Sun Feb 14 14:14:36 CST 2021



> On Feb 14, 2021, at 1:25 PM, Zhang, Hong <hongzhang at anl.gov> wrote:
> 
> 
> 
>> On Feb 14, 2021, at 12:04 PM, Barry Smith <bsmith at petsc.dev> wrote:
>> 
>> 
>>  For our handcoded AVX functions this is fine, we can handle the dispatching ourselves. 
> 
> Cool. _may_i_use_cpu_feature() would be very useful to determine the optimal AVX code path at runtime. Theoretically we just need to query for the needed features once and cache the results.
> 
>> 
>> But what about all the tons of regular code in PETSc, somehow we need to have the same function compiled twice and dispatched properly. Do we use what Hong suggested with fat binaries? So fat-binaries PLUS _may_i_use_cpu_feature together are the way to portable transportable libraries?
>> 
>> 
>> And we do this always --with-debugging=0 so everyone, packages and users get portable but also the best performance possible.
> 
> IMHO, only package managers should consider using -ax options. On our side, if we want to satisfy the needs of different parties (developers, users, package managers), better be conservative than aggressive. -march=native brings huge performance improvement

  But this means most our users are year after year throwing lots of performance on the floor and don't even know it. I think we pander for portability too much.

> but it has never been the default for many compilers with a good reason. Even -O3 does not enable the advanced vector instructions. I just did a quick check on petsc-02: 
> 
> hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null | grep SSE
> #define __SSE__ 1
> #define __SSE_MATH__ 1
> #define __SSE2__ 1
> #define __SSE2_MATH__ 1
> hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null | grep avx
> hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ 
> 
> What Jed usually does (--with-debugging=0 COPTFLAGS='-O2 -march=native’) can be suggested to anyone who does not need to care about portability. If you do not want users to specify the magic options, perhaps we can provide a configure option like --with-portability. If it is set to false, we add aggressive flags automatically.

   My feeling is 90+% of users don't care about portability, they want to get fast performance on the machine they are compiling with (or a collection of machines they have around).  

   Can we build aggressively for their system (except package managers and for people who provide the -march) and have PetscInitialize() produce a very useful error message if they then run the code on a system where it will not work? Any system calls to get that type of information?

  Barry



> 
> Hong
> 
>> 
>> Barry
>> 
>> 
>>> On Feb 14, 2021, at 11:50 AM, Jed Brown <jed at jedbrown.org> wrote:
>>> 
>>>> 
>>> 
>>> immintrin.h provides
>>> 
>>> if (_may_i_use_cpu_feature(_FEATURE_FMA | _FEATURE_AVX2) {
>>> fancy_version_that_needs_fma_and_avx2();
>>> } else {
>>> fallback_version();
>>> }
>>> 
>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_may_i_use&expand=3677,3677
>>> 
>>> I believe this function is slightly expensive because it probably calls the CPUID instruction each time. BLIS has code to cache the result and query features with simple bitwise math.
>>> 
>>> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.h
>>> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.c
>>> 
>>> Of course this bit of dispatch should typically be done at object creation time, not every iteration.
>> 
> 



More information about the petsc-dev mailing list