[petsc-users] Is OpenMP still available for PETSc?
Franck Houssen
franck.houssen at inria.fr
Thu Jul 6 07:23:59 CDT 2017
I do NOT have an answer !
What I was trying to say is that, it WOULD be likely that each MPI proc gets some kind of "sub parts" (blocks - possibly overlapping ?) of the initial problem. Then, it WOULD be likely that each MPI proc has several threads to process each "sub part". These assumptions may be "not so true" (depending on pc, cases, methods, options, ...) or "just" wrong : I do NOT know how PETSc has been written !
Suppose the picture I assume is "not so far" from reality, then:
1. increasing t (= slurm tasks = nb of MPI procs) may change the problem (*), so you MAY (?) no more "really" compare (or you need to make SURE that the overlapping and other stuffs still allow comparison - not so obvious).
2. increasing c (= slurm cpus = nb of threads per MPI proc) SHOULD improve timings and possibly iterations (but you also may NOT get any improvement if threads go to a point where they are drowned, concurrent, victim of false sharing or any kind of other reasons...)
To make it short: "to allow comparisons, first, I WOULD have fixed t, and then, I would have changed only c.... And I would NOT have been so surprised if speed-up would show up to be disappointing".
I may be wrong.
Franck
(*) : changing t changes "sub parts" (blocks, borders of local "sub parts", overlapping, ....). My understanding is that whatever the method you use overlapping of blocks (however you name them according to methods, options, ...) "sizes" the informations that come from "neighbor proc" : my understanding is that this information is mandatory for good local convergence and may impact it (but I may be wrong)
----- Mail original -----
> De: "Damian Kaliszan" <damian at man.poznan.pl>
> À: "Franck Houssen" <franck.houssen at inria.fr>
> Cc: petsc-users at mcs.anl.gov
> Envoyé: Jeudi 6 Juillet 2017 09:56:58
> Objet: Re: [petsc-users] Is OpenMP still available for PETSc?
>
> Dear Franck,
>
> Thank you for tour comment!
> Did I get you correctly: According to you may t<->c combination
> influence the the convergence -> ksp iterations -> timings (result
> array/vector should be identical though)?
>
> Best,
> Damian
>
> W liście datowanym 5 lipca 2017 (15:09:59) napisano:
>
> > For a given use case, you may want to try all possible t and c such
> > that t*c=n : stick to the best one.
>
> > Now, if you modify resources (t/c) and you get different
> > timing/iterations, this seems logical to me: blocks, overlap, ...
> > (and finally convergence) will differ so comparison does no more
> > really make sense as you do something different (unless you fix t,
> > and let c vary: even like that, you may not get what you expect -
> > anyway, seems it's not what you do).
>
> > Franck
>
> > ----- Mail original -----
> >> De: "Damian Kaliszan" <damian at man.poznan.pl>
> >> À: "Franck Houssen" <franck.houssen at inria.fr>, "Barry Smith"
> >> <bsmith at mcs.anl.gov>
> >> Cc: petsc-users at mcs.anl.gov
> >> Envoyé: Mercredi 5 Juillet 2017 10:50:39
> >> Objet: Re: [petsc-users] Is OpenMP still available for PETSc?
> >>
> >> Thank you:)
> >>
> >> Few notes on what you wrote
> >> 1. I always try to keep t*c=number of cores, however for 64 core KNL
> >> which has hyperthreading switched on (cpuinfo shows 256 cores) t*c
> >> should be 64 or 256 (in other words: is t=64 and c=4 correct?) ?
> >> 2. I noticed that for the same input data I may get different
> >> timings in 2 cases
> >> a) different number of ksp iterations are observed (why they differ?)
> >> -> please see screenshot Julia_N_10_4_vs_64.JPG for the following
> >> config (this may be
> >> related to 64*4 issue + which one is correct from first glance?):
> >>
> >> Matrix size=1000x1000
> >>
> >> 1/ slurm-23716.out, 511 steps, ~ 28 secs
> >> #SBATCH --nodes=1
> >> #SBATCH --ntasks=64
> >> #SBATCH --ntasks-per-node=64
> >> #SBATCH --cpus-per-task=4
> >>
> >>
> >> 2/ slurm-23718.out, 94 steps, ~ 4 secs
> >>
> >> #SBATCH --nodes=1
> >> #SBATCH --ntasks=4
> >> #SBATCH --ntasks-per-node=4
> >> #SBATCH --cpus-per-task=4
> >>
> >> b) equal number of ksp iterations are observed but different timings
> >> (this might be
> >> due to false sharing or oversubscription ?)
> >> please see Julia_N_10_64_vs_64.JPG screenshot
> >>
> >>
> >>
> >> Best,
> >> Damian
> >>
> >> W liście datowanym 5 lipca 2017 (10:26:46) napisano:
> >>
> >> > The man page of slurm/sbatch is cumbersome.
> >>
> >> > But, you may think of :
> >> > 1. tasks "as MPI processus"
> >> > 2. cpus "as threads"
> >>
> >> > You should always set resources the most precise way when possible,
> >> > that is (never use --tasks but prefer) to:
> >> > 1. use --nodes=n.
> >> > 2. use --tasks-per-node=t.
> >> > 3. use --cpus-per-tasks=c.
> >> > 4. for a start, make sure that t*c = number of cores you have per node.
> >> > 5. use --exclusive unless you may have VERY different timing if you run
> >> > twice the same job.
> >> > 6. make sure mpi is configured correctly (run twice [or more] the
> >> > same mono-thread application: get the same timing ?)
> >> > 7. if using OpenMP or multithread applications, make sure you have
> >> > set affinity properly (GOMP_CPU_AFFINITY whith gnu, KMP_AFFINITY with
> >> > intel).
> >> > 8. make sure you have enough memory (--mem) unless performance may be
> >> > degraded (swap).
> >>
> >> > The rule of thumb 4 may NOT be respected but if so, you need to be
> >> > aware WHY you want to do that (for KNL, it may [or not] make sense
> >> > [depending on cache modes]).
> >>
> >> > Remember than any multi-threaded (OpenMP or not) application may be
> >> > victim of false sharing
> >> > (https://en.wikipedia.org/wiki/False_sharing): in this case, profile
> >> > (using cache metrics) may help to understand if this is the problem,
> >> > and track it if so (you may use perf-record for that).
> >>
> >> > Understanding HW is not an easy thing: you really need to go step
> >> > by step unless you have no chance to understand anything in the end.
> >>
> >> > Hope this may help !...
> >>
> >> > Franck
> >>
> >> > Note: activating/deactivating hyper-threading (if available -
> >> > generally in BIOS when possible) may also change performances.
> >>
> >> > ----- Mail original -----
> >> >> De: "Barry Smith" <bsmith at mcs.anl.gov>
> >> >> À: "Damian Kaliszan" <damian at man.poznan.pl>
> >> >> Cc: "PETSc" <petsc-users at mcs.anl.gov>
> >> >> Envoyé: Mardi 4 Juillet 2017 19:04:36
> >> >> Objet: Re: [petsc-users] Is OpenMP still available for PETSc?
> >> >>
> >> >>
> >> >> You may need to ask a slurm expert. I have no idea what
> >> >> cpus-per-task
> >> >> means
> >> >>
> >> >>
> >> >> > On Jul 4, 2017, at 4:16 AM, Damian Kaliszan <damian at man.poznan.pl>
> >> >> > wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Yes, this is exactly what I meant.
> >> >> > Please find attached output for 2 input datasets and for 2 various
> >> >> > slurm
> >> >> > configs each:
> >> >> >
> >> >> > A/ Matrix size=8000000x8000000
> >> >> >
> >> >> > 1/ slurm-14432809.out, 930 ksp steps, ~90 secs
> >> >> >
> >> >> >
> >> >> > #SBATCH --nodes=2
> >> >> > #SBATCH --ntasks=32
> >> >> > #SBATCH --ntasks-per-node=16
> >> >> > #SBATCH --cpus-per-task=4
> >> >> >
> >> >> > 2/ slurm-14432810.out , 100.000 ksp steps, ~9700 secs
> >> >> >
> >> >> > #SBATCH --nodes=2
> >> >> > #SBATCH --ntasks=32
> >> >> > #SBATCH --ntasks-per-node=16
> >> >> > #SBATCH --cpus-per-task=2
> >> >> >
> >> >> >
> >> >> >
> >> >> > B/ Matrix size=1000x1000
> >> >> >
> >> >> > 1/ slurm-23716.out, 511 ksp steps, ~ 28 secs
> >> >> > #SBATCH --nodes=1
> >> >> > #SBATCH --ntasks=64
> >> >> > #SBATCH --ntasks-per-node=64
> >> >> > #SBATCH --cpus-per-task=4
> >> >> >
> >> >> >
> >> >> > 2/ slurm-23718.out, 94 ksp steps, ~ 4 secs
> >> >> >
> >> >> > #SBATCH --nodes=1
> >> >> > #SBATCH --ntasks=4
> >> >> > #SBATCH --ntasks-per-node=4
> >> >> > #SBATCH --cpus-per-task=4
> >> >> >
> >> >> >
> >> >> > I would really appreciate any help...:)
> >> >> >
> >> >> > Best,
> >> >> > Damian
> >> >> >
> >> >> >
> >> >> >
> >> >> > W liście datowanym 3 lipca 2017 (16:29:15) napisano:
> >> >> >
> >> >> >
> >> >> > On Mon, Jul 3, 2017 at 9:23 AM, Damian Kaliszan
> >> >> > <damian at man.poznan.pl>
> >> >> > wrote:
> >> >> > Hi,
> >> >> >
> >> >> >
> >> >> > >> 1) You can call Bcast on PETSC_COMM_WORLD
> >> >> >
> >> >> > To be honest I can't find Bcast method in petsc4py.PETSc.Comm
> >> >> > (I'm
> >> >> > using petsc4py)
> >> >> >
> >> >> > >> 2) If you are using WORLD, the number of iterates will be the same
> >> >> > >> on
> >> >> > >> each process since iteration is collective.
> >> >> >
> >> >> > Yes, this is how it should be. But what I noticed is that
> >> >> > for
> >> >> > different --cpus-per-task numbers in slurm script I get
> >> >> > different
> >> >> > number of solver iterations which is in turn related to timings.
> >> >> > The
> >> >> > imparity is huge. For example for some configurations
> >> >> > where
> >> >> > --cpus-per-task=1 I receive 900
> >> >> > iterations and for --cpus-per-task=2 I receive valid number of
> >> >> > 100.000
> >> >> > which is set as max
> >> >> > iter number set when setting solver tolerances.
> >> >> >
> >> >> > I am trying to understand what you are saying. You mean that you make
> >> >> > 2
> >> >> > different runs and get a different
> >> >> > number of iterates with a KSP? In order to answer questions about
> >> >> > convergence, we need to see the output
> >> >> > of
> >> >> >
> >> >> > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason
> >> >> >
> >> >> > for all cases.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Matt
> >> >> >
> >> >> > Best,
> >> >> > Damian
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > What most experimenters take for granted before they begin their
> >> >> > experiments is infinitely more interesting than any results to which
> >> >> > their
> >> >> > experiments lead.
> >> >> > -- Norbert Wiener
> >> >> >
> >> >> > http://www.caam.rice.edu/~mk51/
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > -------------------------------------------------------
> >> >> > Damian Kaliszan
> >> >> >
> >> >> > Poznan Supercomputing and Networking Center
> >> >> > HPC and Data Centres Technologies
> >> >> > ul. Jana Pawła II 10
> >> >> > 61-139 Poznan
> >> >> > POLAND
> >> >> >
> >> >> > phone (+48 61) 858 5109
> >> >> > e-mail damian at man.poznan.pl
> >> >> > www - http://www.man.poznan.pl/
> >> >> > -------------------------------------------------------
> >> >> > <slum_output.zip>
> >> >>
> >> >>
>
>
>
More information about the petsc-users
mailing list