[petsc-users] Fwd: strange PETSc/KSP GMRES timings for MPI+OMP configuration on KNLs

Mon Jun 19 08:39:53 CDT 2017

On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan <damian at man.poznan.pl>
wrote:

> Hi,
> Thank you for the answer and the article.
> I  use  SLURM  (srun)  for  job  submission by running
> 'srun script.py script_parameters' command inside batch script so this is
> SPMD model.
> What  I  noticed  is  that the problems I'm having now didn't happened
> before  on CPU E5-2697 v3  nodes (28 cores - the best perormance I had
> was using 14MPIs/2OMP per node). Problems started to appear when I moved
> to KNLs.
> The   funny   thing   is   that   switching  OMP  on/off  (by  setting
> OMP_NUM_THREADS   to   1)   doesn't  help  for  all  #NODES/# MPI/ #OMP
> combinations.  For  example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the
> timings are huge and for 4 is OK.
>

Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did you
require
KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers
for the different configurations.
This measures just latency. We could also look at VecScale() to look at
memory bandwidth achieved.

  Thanks,

    Matt

> Playing with affinitty didn't help so far.
> In  other  words at first glance results look completely random   (I can
> provide more such examples).
>
>
>
> Best,
> Damian
>
> W liście datowanym 19 czerwca 2017 (14:50:25) napisano:
>
>
> On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan <damian at man.poznan.pl>
> wrote:
> Hi,
>
> Regarding my previous post
> I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP.
>
>
> What attracted my attention is huge difference in MPI timings in the
> following places:
>
> Average time to get PetscTime(): 2.14577e-07
> Average time for MPI_Barrier(): 3.9196e-05
> Average time for zero size MPI_Send(): 5.45382e-06
>
> vs.
>
> Average time to get PetscTime(): 4.05312e-07
> Average time for MPI_Barrier(): 0.348399
> Average time for zero size MPI_Send(): 0.029937
>
> Isn't something wrong with PETSc library itself?...
>
>  I don't think so. This is bad interaction of MPI and your threading
> mechanism. MPI_Barrier() and MPI_Send() are lower
> level than PETSc. What threading mode did you choose for MPI? This can
> have a performance impact.
>
> Also, the justifications for threading in this context are weak (or
> non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_
> Computing_without_Threads-Barry_Smith.pdf
>
>   Thanks,
>
>     Matt
>
>
> Best,
> Damian
>
> Wiadomość przekazana
> Od: Damian Kaliszan <damian at man.poznan.pl>
> Do: PETSc users list <petsc-users at mcs.anl.gov>
> Data: 16 czerwca 2017, 14:57:10
> Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP
> configuration on KNLs
>
> ===8<===============Treść oryginalnej wiadomości===============
> Hi,
>
> For  several  days  I've been trying to figure out what is going wrong
> with my python app timings solving Ax=b with KSP (GMRES) solver when
> trying to run on Intel's KNL 7210/7230.
>
> I  downsized  the  problem  to  1000x1000 A matrix and a single node and
> observed the following:
>
>
> I'm attaching 2 extreme timings where configurations differ only by 1 OMP
> thread (64MPI/1 OMP vs 64/2 OMPs),
> 23321 vs 23325 slurm task ids.
>
> Any help will be appreciated....
>
> Best,
> Damian
>
> ===8<===========Koniec treści oryginalnej wiadomości===========
>
>
>
> -------------------------------------------------------
> Damian Kaliszan
>
> Poznan Supercomputing and Networking Center
> HPC and Data Centres Technologies
> ul. Jana Pawła II 10
> 61-139 Poznan
> POLAND
>
> phone (+48 61) 858 5109
> e-mail damian at man.poznan.pl
> www - http://www.man.poznan.pl/
> -------------------------------------------------------
>
>
> ---------- Forwarded message ----------
> From: Damian Kaliszan <damian at man.poznan.pl>
> To: PETSc users list <petsc-users at mcs.anl.gov>
> Cc:
> Bcc:
> Date: Fri, 16 Jun 2017 14:57:10 +0200
> Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP
> configuration on KNLs
> Hi,
>
> For  several  days  I've been trying to figure out what is going wrong
> with my python app timings solving Ax=b with KSP (GMRES) solver when
> trying to run on Intel's KNL 7210/7230.
>
> I  downsized  the  problem  to  1000x1000 A matrix and a single node and
> observed the following:
>
>
> I'm attaching 2 extreme timings where configurations differ only by 1 OMP
> thread (64MPI/1 OMP vs 64/2 OMPs),
> 23321 vs 23325 slurm task ids.
>
> Any help will be appreciated....
>
> Best,
> Damian
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/
>
>
>
> -------------------------------------------------------
> Damian Kaliszan
>
> Poznan Supercomputing and Networking Center
> HPC and Data Centres Technologies
> ul. Jana Pawła II 10
> 61-139 Poznan
> POLAND
>
> phone (+48 61) 858 5109
> e-mail damian at man.poznan.pl
> www - http://www.man.poznan.pl/
> -------------------------------------------------------
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170619/259cbb4d/attachment.html>