[petsc-users] Configuring PETSc for KNL

Mon Apr 3 13:44:42 CDT 2017

Richard,

This is what my job script looks like:

#!/bin/bash
#SBATCH -N 16
#SBATCH -C knl,quad,flat
#SBATCH -p regular
#SBATCH -J knlflat1024
#SBATCH -L SCRATCH
#SBATCH -o knlflat1024.o%j
#SBATCH --mail-type=ALL
#SBATCH --mail-user=jychang48 at gmail.com
#SBATCH -t 00:20:00

#run the application:
cd $SCRATCH/Icesheet
sbcast --compress=lz4 ./ex48cori /tmp/ex48cori
srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N 128
-P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1

According to the NERSC info pages, they say to add the "numactl" if using
flat mode. Previously I tried cache mode but the performance seems to be
unaffected.

I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly
4-5x faster. Though I suspect this drastic change has much to do with the
initial coarse grid size now being extremely small.

I'll give the COPTFLAGS a try and see what happens

Thanks,
Justin

On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills <richardtmills at gmail.com>
wrote:

> Hi Justin,
>
> How is the MCDRAM (on-package "high-bandwidth memory") configured for your
> KNL runs?  And if it is in "flat" mode, what are you doing to ensure that
> you use the MCDRAM?  Doing this wrong seems to be one of the most common
> reasons for unexpected poor performance on KNL.
>
> I'm not that familiar with the environment on Cori, but I think that if
> you are building for KNL, you should add "-xMIC-AVX512" to your compiler
> flags to explicitly instruct the compiler to use the AVX512 instruction
> set.  I usually use something along the lines of
>
>   'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512'
>
> (The "-g" just adds symbols, which make the output from performance
> profiling tools much more useful.)
>
> That said, I think that if you are comparing 1024 Haswell cores vs. 1024
> KNL cores (so double the number of Haswell nodes), I'm not surprised that
> the simulations are almost twice as fast using the Haswell nodes.  Keep in
> mind that individual KNL cores are much less powerful than an individual
> Haswell node.  You are also using roughly twice the power footprint (dual
> socket Haswell node should be roughly equivalent to a KNL node, I
> believe).  How do things look on when you compare equal nodes?
>
> Cheers,
> Richard
>
> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang <jychang48 at gmail.com> wrote:
>
>> Hi all,
>>
>> On NERSC's Cori I have the following configure options for PETSc:
>>
>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0
>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn
>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1
>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt
>>
>> Where I swapped out the default Intel programming environment with that
>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I
>> want to document the performance difference between Cori's Haswell and KNL
>> processors.
>>
>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and
>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes.
>> Which leads me to suspect that I am not doing something right for KNL. Does
>> anyone know what are some "optimal" configure options for running PETSc on
>> KNL?
>>
>> Thanks,
>> Justin
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170403/73336780/attachment.html>