[petsc-dev] error with flags PETSc uses for determining AVX

Zhang, Hong hongzhang at anl.gov
Sun Feb 14 13:40:34 CST 2021


Oops, a typo in the command line. Should be AVX. SSE3 or above and AVX are not used for -O3.

hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null | grep SSE
#define __SSE__ 1
#define __SSE_MATH__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null | grep AVX
hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$

> On Feb 14, 2021, at 1:25 PM, Zhang, Hong via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> 
> 
> 
>> On Feb 14, 2021, at 12:04 PM, Barry Smith <bsmith at petsc.dev> wrote:
>> 
>> 
>>  For our handcoded AVX functions this is fine, we can handle the dispatching ourselves. 
> 
> Cool. _may_i_use_cpu_feature() would be very useful to determine the optimal AVX code path at runtime. Theoretically we just need to query for the needed features once and cache the results.
> 
>> 
>> But what about all the tons of regular code in PETSc, somehow we need to have the same function compiled twice and dispatched properly. Do we use what Hong suggested with fat binaries? So fat-binaries PLUS _may_i_use_cpu_feature together are the way to portable transportable libraries?
>> 
>> 
>> And we do this always --with-debugging=0 so everyone, packages and users get portable but also the best performance possible.
> 
> IMHO, only package managers should consider using -ax options. On our side, if we want to satisfy the needs of different parties (developers, users, package managers), better be conservative than aggressive. -march=native brings huge performance improvement but it has never been the default for many compilers with a good reason. Even -O3 does not enable the advanced vector instructions. I just did a quick check on petsc-02: 
> 
> hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null | grep SSE
> #define __SSE__ 1
> #define __SSE_MATH__ 1
> #define __SSE2__ 1
> #define __SSE2_MATH__ 1
> hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null | grep avx
> hongzhang at petsc-02:/nfs/gce/projects/TSAdjoint$ 
> 
> What Jed usually does (--with-debugging=0 COPTFLAGS='-O2 -march=native’) can be suggested to anyone who does not need to care about portability. If you do not want users to specify the magic options, perhaps we can provide a configure option like --with-portability. If it is set to false, we add aggressive flags automatically.
> 
> Hong
> 
>> 
>> Barry
>> 
>> 
>>> On Feb 14, 2021, at 11:50 AM, Jed Brown <jed at jedbrown.org> wrote:
>>> 
>>>> 
>>> 
>>> immintrin.h provides
>>> 
>>> if (_may_i_use_cpu_feature(_FEATURE_FMA | _FEATURE_AVX2) {
>>> fancy_version_that_needs_fma_and_avx2();
>>> } else {
>>> fallback_version();
>>> }
>>> 
>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_may_i_use&expand=3677,3677
>>> 
>>> I believe this function is slightly expensive because it probably calls the CPUID instruction each time. BLIS has code to cache the result and query features with simple bitwise math.
>>> 
>>> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.h
>>> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.c
>>> 
>>> Of course this bit of dispatch should typically be done at object creation time, not every iteration.
>> 
> 



More information about the petsc-dev mailing list