[petsc-users] Is OpenMP still available for PETSc?
Damian Kaliszan
damian at man.poznan.pl
Wed Jul 5 03:49:23 CDT 2017
Thank you:)
Few notes on what you wrote
1. I always try to keep t*c=number of cores, however for 64 core KNL
which has hyperthreading switched on (cpuinfo shows 256 cores) t*c
should be 64 or 256 (in other words: is t=64 and c=4 correct?) ?
2. I noticed that for the same input data I may get different
timings in 2 cases
a) different number of ksp iterations are observed (why they differ?)
-> please see screenshot Julia_N_10_4_vs_64.JPG for the following config (this may be
related to 64*4 issue + which one is correct from first glance?):
Matrix size=1000x1000
1/ slurm-23716.out, 511 steps, ~ 28 secs
#SBATCH --nodes=1
#SBATCH --ntasks=64
#SBATCH --ntasks-per-node=64
#SBATCH --cpus-per-task=4
2/ slurm-23718.out, 94 steps, ~ 4 secs
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4
b) equal number of ksp iterations are observed but different timings (this might be
due to false sharing or oversubscription ?)
please see Julia_N_10_64_vs_64.JPG screenshot
Best,
Damian
W liście datowanym 5 lipca 2017 (10:26:46) napisano:
> The man page of slurm/sbatch is cumbersome.
> But, you may think of :
> 1. tasks "as MPI processus"
> 2. cpus "as threads"
> You should always set resources the most precise way when possible,
> that is (never use --tasks but prefer) to:
> 1. use --nodes=n.
> 2. use --tasks-per-node=t.
> 3. use --cpus-per-tasks=c.
> 4. for a start, make sure that t*c = number of cores you have per node.
> 5. use --exclusive unless you may have VERY different timing if you run twice the same job.
> 6. make sure mpi is configured correctly (run twice [or more] the
> same mono-thread application: get the same timing ?)
> 7. if using OpenMP or multithread applications, make sure you have
> set affinity properly (GOMP_CPU_AFFINITY whith gnu, KMP_AFFINITY with intel).
> 8. make sure you have enough memory (--mem) unless performance may be degraded (swap).
> The rule of thumb 4 may NOT be respected but if so, you need to be
> aware WHY you want to do that (for KNL, it may [or not] make sense [depending on cache modes]).
> Remember than any multi-threaded (OpenMP or not) application may be
> victim of false sharing
> (https://en.wikipedia.org/wiki/False_sharing): in this case, profile
> (using cache metrics) may help to understand if this is the problem,
> and track it if so (you may use perf-record for that).
> Understanding HW is not an easy thing: you really need to go step
> by step unless you have no chance to understand anything in the end.
> Hope this may help !...
> Franck
> Note: activating/deactivating hyper-threading (if available -
> generally in BIOS when possible) may also change performances.
> ----- Mail original -----
>> De: "Barry Smith" <bsmith at mcs.anl.gov>
>> À: "Damian Kaliszan" <damian at man.poznan.pl>
>> Cc: "PETSc" <petsc-users at mcs.anl.gov>
>> Envoyé: Mardi 4 Juillet 2017 19:04:36
>> Objet: Re: [petsc-users] Is OpenMP still available for PETSc?
>>
>>
>> You may need to ask a slurm expert. I have no idea what cpus-per-task
>> means
>>
>>
>> > On Jul 4, 2017, at 4:16 AM, Damian Kaliszan <damian at man.poznan.pl> wrote:
>> >
>> > Hi,
>> >
>> > Yes, this is exactly what I meant.
>> > Please find attached output for 2 input datasets and for 2 various slurm
>> > configs each:
>> >
>> > A/ Matrix size=8000000x8000000
>> >
>> > 1/ slurm-14432809.out, 930 ksp steps, ~90 secs
>> >
>> >
>> > #SBATCH --nodes=2
>> > #SBATCH --ntasks=32
>> > #SBATCH --ntasks-per-node=16
>> > #SBATCH --cpus-per-task=4
>> >
>> > 2/ slurm-14432810.out , 100.000 ksp steps, ~9700 secs
>> >
>> > #SBATCH --nodes=2
>> > #SBATCH --ntasks=32
>> > #SBATCH --ntasks-per-node=16
>> > #SBATCH --cpus-per-task=2
>> >
>> >
>> >
>> > B/ Matrix size=1000x1000
>> >
>> > 1/ slurm-23716.out, 511 ksp steps, ~ 28 secs
>> > #SBATCH --nodes=1
>> > #SBATCH --ntasks=64
>> > #SBATCH --ntasks-per-node=64
>> > #SBATCH --cpus-per-task=4
>> >
>> >
>> > 2/ slurm-23718.out, 94 ksp steps, ~ 4 secs
>> >
>> > #SBATCH --nodes=1
>> > #SBATCH --ntasks=4
>> > #SBATCH --ntasks-per-node=4
>> > #SBATCH --cpus-per-task=4
>> >
>> >
>> > I would really appreciate any help...:)
>> >
>> > Best,
>> > Damian
>> >
>> >
>> >
>> > W liście datowanym 3 lipca 2017 (16:29:15) napisano:
>> >
>> >
>> > On Mon, Jul 3, 2017 at 9:23 AM, Damian Kaliszan <damian at man.poznan.pl>
>> > wrote:
>> > Hi,
>> >
>> >
>> > >> 1) You can call Bcast on PETSC_COMM_WORLD
>> >
>> > To be honest I can't find Bcast method in petsc4py.PETSc.Comm (I'm
>> > using petsc4py)
>> >
>> > >> 2) If you are using WORLD, the number of iterates will be the same on
>> > >> each process since iteration is collective.
>> >
>> > Yes, this is how it should be. But what I noticed is that for
>> > different --cpus-per-task numbers in slurm script I get different
>> > number of solver iterations which is in turn related to timings. The
>> > imparity is huge. For example for some configurations where
>> > --cpus-per-task=1 I receive 900
>> > iterations and for --cpus-per-task=2 I receive valid number of 100.000
>> > which is set as max
>> > iter number set when setting solver tolerances.
>> >
>> > I am trying to understand what you are saying. You mean that you make 2
>> > different runs and get a different
>> > number of iterates with a KSP? In order to answer questions about
>> > convergence, we need to see the output
>> > of
>> >
>> > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason
>> >
>> > for all cases.
>> >
>> > Thanks,
>> >
>> > Matt
>> >
>> > Best,
>> > Damian
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments is infinitely more interesting than any results to which their
>> > experiments lead.
>> > -- Norbert Wiener
>> >
>> > http://www.caam.rice.edu/~mk51/
>> >
>> >
>> >
>> >
>> >
More information about the petsc-users
mailing list