[petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs

Mon Jun 19 08:56:32 CDT 2017

Hi,

Please find attached 2 output files from 64MPI/1 OMP vs 64/2 OMPs examples,
23321 vs 23325 slurm task ids.

Best,
Damian

W liście datowanym 19 czerwca 2017 (15:39:53) napisano:

On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan <damian at man.poznan.pl> wrote:
Hi,
Thank you for the answer and the article.
I  use  SLURM  (srun)  for  job  submission by running
'srun script.py script_parameters' command inside batch script so this is SPMD model.
What  I  noticed  is  that the problems I'm having now didn't happened
before  on CPU E5-2697 v3  nodes (28 cores - the best perormance I had
was using 14MPIs/2OMP per node). Problems started to appear when I moved to KNLs.
The   funny   thing   is   that   switching  OMP  on/off  (by  setting
OMP_NUM_THREADS   to   1)   doesn't  help  for  all  #NODES/# MPI/ #OMP
combinations.  For  example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the
timings are huge and for 4 is OK.

Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did you require
KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers for the different configurations.
This measures just latency. We could also look at VecScale() to look at memory bandwidth achieved.

  Thanks,

    Matt

Playing with affinitty didn't help so far.
In  other  words at first glance results look completely random   (I can
provide more such examples).

Best,
Damian

W liście datowanym 19 czerwca 2017 (14:50:25) napisano:

On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan <damian at man.poznan.pl> wrote:
Hi,

Regarding my previous post
I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP.

What attracted my attention is huge difference in MPI timings in the following places:

Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 3.9196e-05
Average time for zero size MPI_Send(): 5.45382e-06

vs.

Average time to get PetscTime(): 4.05312e-07
Average time for MPI_Barrier(): 0.348399
Average time for zero size MPI_Send(): 0.029937

Isn't something wrong with PETSc library itself?...

 I don't think so. This is bad interaction of MPI and your threading mechanism. MPI_Barrier() and MPI_Send() are lower
level than PETSc. What threading mode did you choose for MPI? This can have a performance impact.

Also, the justifications for threading in this context are weak (or non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf

  Thanks,

    Matt

Best,
Damian

Wiadomość przekazana
Od: Damian Kaliszan <damian at man.poznan.pl>
Do: PETSc users list <petsc-users at mcs.anl.gov>
Data: 16 czerwca 2017, 14:57:10
Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs

===8<===============Treść oryginalnej wiadomości===============
Hi,

For  several  days  I've been trying to figure out what is going wrong
with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230.

I  downsized  the  problem  to  1000x1000 A matrix and a single node and
observed the following:

I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs),
23321 vs 23325 slurm task ids.

Any help will be appreciated....

Best,
Damian

===8<===========Koniec treści oryginalnej wiadomości===========

-------------------------------------------------------
Damian Kaliszan

Poznan Supercomputing and Networking Center
HPC and Data Centres Technologies
ul. Jana Pawła II 10
61-139 Poznan
POLAND

phone (+48 61) 858 5109
e-mail damian at man.poznan.pl
www - http://www.man.poznan.pl/
-------------------------------------------------------

---------- Forwarded message ----------
From: Damian Kaliszan <damian at man.poznan.pl>
To: PETSc users list <petsc-users at mcs.anl.gov>
Cc:
Bcc:
Date: Fri, 16 Jun 2017 14:57:10 +0200
Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs
Hi,

For  several  days  I've been trying to figure out what is going wrong
with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230.

I  downsized  the  problem  to  1000x1000 A matrix and a single node and
observed the following:

I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs),
23321 vs 23325 slurm task ids.

Any help will be appreciated....

Best,
Damian

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/

-------------------------------------------------------
Damian Kaliszan

Poznan Supercomputing and Networking Center
HPC and Data Centres Technologies
ul. Jana Pawła II 10
61-139 Poznan
POLAND

phone (+48 61) 858 5109
e-mail damian at man.poznan.pl
www - http://www.man.poznan.pl/
-------------------------------------------------------

-- 
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/

-------------------------------------------------------
Damian Kaliszan

Poznan Supercomputing and Networking Center
HPC and Data Centres Technologies
ul. Jana Pawła II 10
61-139 Poznan
POLAND

phone (+48 61) 858 5109
e-mail damian at man.poznan.pl
www - http://www.man.poznan.pl/
------------------------------------------------------- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm-23321.out
Type: application/octet-stream
Size: 39057 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170619/2bde830a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm-23325.out
Type: application/octet-stream
Size: 39054 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170619/2bde830a/attachment-0003.obj>