[petsc-users] Is OpenMP still available for PETSc?

Wed Jul 5 03:26:46 CDT 2017

The man page of slurm/sbatch is cumbersome.

But, you may think of :
1. tasks "as MPI processus"
2. cpus "as threads"

You should always set resources the most precise way when possible, that is (never use --tasks but prefer) to:
1. use --nodes=n.
2. use --tasks-per-node=t.
3. use --cpus-per-tasks=c.
4. for a start, make sure that t*c = number of cores you have per node.
5. use --exclusive unless you may have VERY different timing if you run twice the same job.
6. make sure mpi is configured correctly (run twice [or more] the same mono-thread application: get the same timing ?)
7. if using OpenMP or multithread applications, make sure you have set affinity properly (GOMP_CPU_AFFINITY whith gnu, KMP_AFFINITY with intel).
8. make sure you have enough memory (--mem) unless performance may be degraded (swap).

The rule of thumb 4 may NOT be respected but if so, you need to be aware WHY you want to do that (for KNL, it may [or not] make sense [depending on cache modes]). 

Remember than any multi-threaded (OpenMP or not) application may be victim of false sharing (https://en.wikipedia.org/wiki/False_sharing): in this case, profile (using cache metrics) may help to understand if this is the problem, and track it if so (you may use perf-record for that).

Understanding HW is not an easy thing: you really need to go step by step unless you have no chance to understand anything in the end.

Hope this may help !...

Franck

Note: activating/deactivating hyper-threading (if available - generally in BIOS when possible) may also change performances. 

----- Mail original -----
> De: "Barry Smith" <bsmith at mcs.anl.gov>
> À: "Damian Kaliszan" <damian at man.poznan.pl>
> Cc: "PETSc" <petsc-users at mcs.anl.gov>
> Envoyé: Mardi 4 Juillet 2017 19:04:36
> Objet: Re: [petsc-users] Is OpenMP still available for PETSc?
> 
> 
>    You may need to ask a slurm expert. I have no idea what cpus-per-task
>    means
> 
> 
> > On Jul 4, 2017, at 4:16 AM, Damian Kaliszan <damian at man.poznan.pl> wrote:
> > 
> > Hi,
> > 
> > Yes, this is exactly what I meant.
> > Please find attached output for 2 input datasets and for 2 various slurm
> > configs each:
> > 
> > A/ Matrix size=8000000x8000000
> > 
> > 1/ slurm-14432809.out,   930 ksp steps, ~90 secs
> > 
> > 
> > #SBATCH --nodes=2
> > #SBATCH --ntasks=32
> > #SBATCH --ntasks-per-node=16
> > #SBATCH --cpus-per-task=4
> > 
> > 2/  slurm-14432810.out , 100.000 ksp steps, ~9700 secs
> > 
> > #SBATCH --nodes=2
> > #SBATCH --ntasks=32
> > #SBATCH --ntasks-per-node=16
> > #SBATCH --cpus-per-task=2
> > 
> > 
> > 
> > B/ Matrix size=1000x1000
> > 
> > 1/  slurm-23716.out, 511 ksp steps, ~ 28 secs
> > #SBATCH --nodes=1
> > #SBATCH --ntasks=64
> > #SBATCH --ntasks-per-node=64
> > #SBATCH --cpus-per-task=4
> > 
> > 
> > 2/ slurm-23718.out, 94 ksp steps, ~ 4 secs
> > 
> > #SBATCH --nodes=1
> > #SBATCH --ntasks=4
> > #SBATCH --ntasks-per-node=4
> > #SBATCH --cpus-per-task=4
> > 
> > 
> > I would really appreciate any help...:)
> > 
> > Best,
> > Damian
> > 
> > 
> > 
> > W liście datowanym 3 lipca 2017 (16:29:15) napisano:
> > 
> > 
> > On Mon, Jul 3, 2017 at 9:23 AM, Damian Kaliszan <damian at man.poznan.pl>
> > wrote:
> > Hi,
> > 
> > 
> > >> 1) You can call Bcast on PETSC_COMM_WORLD
> > 
> > To  be  honest  I  can't find Bcast method in petsc4py.PETSc.Comm (I'm
> > using petsc4py)
> > 
> > >> 2) If you are using WORLD, the number of iterates will be the same on
> > >> each process since iteration is collective.
> > 
> > Yes,  this  is  how  it  should  be.  But  what  I noticed is that for
> > different  --cpus-per-task  numbers  in  slurm  script I get different
> > number  of  solver iterations which is in turn related to timings. The
> > imparity    is    huge.  For  example  for   some configurations where
> > --cpus-per-task=1 I receive 900
> > iterations and  for --cpus-per-task=2 I receive valid number of 100.000
> > which is set as max
> > iter number set when setting solver tolerances.
> > 
> > I am trying to understand what you are saying. You mean that you make 2
> > different runs and get a different
> > number of iterates with a KSP? In order to answer questions about
> > convergence, we need to see the output
> > of
> > 
> > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason
> > 
> > for all cases.
> > 
> > Thanks,
> > 
> >    Matt
> > 
> > Best,
> > Damian
> > 
> > 
> > 
> > 
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > -- Norbert Wiener
> > 
> > http://www.caam.rice.edu/~mk51/
> > 
> > 
> > 
> > 
> > 
> > -------------------------------------------------------
> > Damian Kaliszan
> > 
> > Poznan Supercomputing and Networking Center
> > HPC and Data Centres Technologies
> > ul. Jana Pawła II 10
> > 61-139 Poznan
> > POLAND
> > 
> > phone (+48 61) 858 5109
> > e-mail damian at man.poznan.pl
> > www - http://www.man.poznan.pl/
> > -------------------------------------------------------
> > <slum_output.zip>
> 
>