[petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs
Damian Kaliszan
damian at man.poznan.pl
Mon Jun 19 08:56:32 CDT 2017
Hi,
Please find attached 2 output files from 64MPI/1 OMP vs 64/2 OMPs examples,
23321 vs 23325 slurm task ids.
Best,
Damian
W liście datowanym 19 czerwca 2017 (15:39:53) napisano:
On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan <damian at man.poznan.pl> wrote:
Hi,
Thank you for the answer and the article.
I use SLURM (srun) for job submission by running
'srun script.py script_parameters' command inside batch script so this is SPMD model.
What I noticed is that the problems I'm having now didn't happened
before on CPU E5-2697 v3 nodes (28 cores - the best perormance I had
was using 14MPIs/2OMP per node). Problems started to appear when I moved to KNLs.
The funny thing is that switching OMP on/off (by setting
OMP_NUM_THREADS to 1) doesn't help for all #NODES/# MPI/ #OMP
combinations. For example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the
timings are huge and for 4 is OK.
Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did you require
KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers for the different configurations.
This measures just latency. We could also look at VecScale() to look at memory bandwidth achieved.
Thanks,
Matt
Playing with affinitty didn't help so far.
In other words at first glance results look completely random (I can
provide more such examples).
Best,
Damian
W liście datowanym 19 czerwca 2017 (14:50:25) napisano:
On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan <damian at man.poznan.pl> wrote:
Hi,
Regarding my previous post
I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP.
What attracted my attention is huge difference in MPI timings in the following places:
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 3.9196e-05
Average time for zero size MPI_Send(): 5.45382e-06
vs.
Average time to get PetscTime(): 4.05312e-07
Average time for MPI_Barrier(): 0.348399
Average time for zero size MPI_Send(): 0.029937
Isn't something wrong with PETSc library itself?...
I don't think so. This is bad interaction of MPI and your threading mechanism. MPI_Barrier() and MPI_Send() are lower
level than PETSc. What threading mode did you choose for MPI? This can have a performance impact.
Also, the justifications for threading in this context are weak (or non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf
Thanks,
Matt
Best,
Damian
Wiadomość przekazana
Od: Damian Kaliszan <damian at man.poznan.pl>
Do: PETSc users list <petsc-users at mcs.anl.gov>
Data: 16 czerwca 2017, 14:57:10
Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs
===8<===============Treść oryginalnej wiadomości===============
Hi,
For several days I've been trying to figure out what is going wrong
with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230.
I downsized the problem to 1000x1000 A matrix and a single node and
observed the following:
I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs),
23321 vs 23325 slurm task ids.
Any help will be appreciated....
Best,
Damian
===8<===========Koniec treści oryginalnej wiadomości===========
-------------------------------------------------------
Damian Kaliszan
Poznan Supercomputing and Networking Center
HPC and Data Centres Technologies
ul. Jana Pawła II 10
61-139 Poznan
POLAND
phone (+48 61) 858 5109
e-mail damian at man.poznan.pl
www - http://www.man.poznan.pl/
-------------------------------------------------------
---------- Forwarded message ----------
From: Damian Kaliszan <damian at man.poznan.pl>
To: PETSc users list <petsc-users at mcs.anl.gov>
Cc:
Bcc:
Date: Fri, 16 Jun 2017 14:57:10 +0200
Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs
Hi,
For several days I've been trying to figure out what is going wrong
with my python app timings solving Ax=b with KSP (GMRES) solver when trying to run on Intel's KNL 7210/7230.
I downsized the problem to 1000x1000 A matrix and a single node and
observed the following:
I'm attaching 2 extreme timings where configurations differ only by 1 OMP thread (64MPI/1 OMP vs 64/2 OMPs),
23321 vs 23325 slurm task ids.
Any help will be appreciated....
Best,
Damian
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
http://www.caam.rice.edu/~mk51/
-------------------------------------------------------
Damian Kaliszan
Poznan Supercomputing and Networking Center
HPC and Data Centres Technologies
ul. Jana Pawła II 10
61-139 Poznan
POLAND
phone (+48 61) 858 5109
e-mail damian at man.poznan.pl
www - http://www.man.poznan.pl/
-------------------------------------------------------
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
http://www.caam.rice.edu/~mk51/
-------------------------------------------------------
Damian Kaliszan
Poznan Supercomputing and Networking Center
HPC and Data Centres Technologies
ul. Jana Pawła II 10
61-139 Poznan
POLAND
phone (+48 61) 858 5109
e-mail damian at man.poznan.pl
www - http://www.man.poznan.pl/
-------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm-23321.out
Type: application/octet-stream
Size: 39057 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170619/2bde830a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm-23325.out
Type: application/octet-stream
Size: 39054 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170619/2bde830a/attachment-0003.obj>
More information about the petsc-users
mailing list