[petsc-users] Configuring PETSc for KNL

Richard Mills richardtmills at gmail.com
Mon Apr 3 13:36:34 CDT 2017


Hi Justin,

How is the MCDRAM (on-package "high-bandwidth memory") configured for your
KNL runs?  And if it is in "flat" mode, what are you doing to ensure that
you use the MCDRAM?  Doing this wrong seems to be one of the most common
reasons for unexpected poor performance on KNL.

I'm not that familiar with the environment on Cori, but I think that if you
are building for KNL, you should add "-xMIC-AVX512" to your compiler flags
to explicitly instruct the compiler to use the AVX512 instruction set.  I
usually use something along the lines of

  'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512'

(The "-g" just adds symbols, which make the output from performance
profiling tools much more useful.)

That said, I think that if you are comparing 1024 Haswell cores vs. 1024
KNL cores (so double the number of Haswell nodes), I'm not surprised that
the simulations are almost twice as fast using the Haswell nodes.  Keep in
mind that individual KNL cores are much less powerful than an individual
Haswell node.  You are also using roughly twice the power footprint (dual
socket Haswell node should be roughly equivalent to a KNL node, I
believe).  How do things look on when you compare equal nodes?

Cheers,
Richard

On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang <jychang48 at gmail.com> wrote:

> Hi all,
>
> On NERSC's Cori I have the following configure options for PETSc:
>
> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0
> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn
> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1
> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt
>
> Where I swapped out the default Intel programming environment with that of
> Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want
> to document the performance difference between Cori's Haswell and KNL
> processors.
>
> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16
> KNL nodes), the simulations are almost twice as fast on Haswell nodes.
> Which leads me to suspect that I am not doing something right for KNL. Does
> anyone know what are some "optimal" configure options for running PETSc on
> KNL?
>
> Thanks,
> Justin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170403/b739e94f/attachment.html>


More information about the petsc-users mailing list