[petsc-users] GAMG Parallel Performance
Karin&NiKo
niko.karin at gmail.com
Fri Nov 16 04:33:50 CST 2018
Dear PETSc team,
I have run the same test on the same number of processes as before (1000,
1500 and 2000) but by increasing the number of nodes. The results are much
better!
If I focus on the KSPSolve event, I have the following timings:
1000 => 1.2681e+02
1500 => 8.7030e+01
2000 => 7.8904e+01
The parallel efficiency between 1000 and 1500 is around to 96% but it
decreases drastically when using 2000 processes. I think my problem is too
small and the communications begin to be important.
I have an extra question : in the profiling section, what is exactly
measured in "Time (sec): " ? I wonder if it is the time between
PetscInitialize and PetscFinalize?
Thanks again for your help,
Nicolas
Le ven. 16 nov. 2018 à 00:24, Karin&NiKo <niko.karin at gmail.com> a écrit :
> Ok. I will do that soon and I will let you know.
> Thanks again,
> Nicolas
>
> Le jeu. 15 nov. 2018 20:50, Smith, Barry F. <bsmith at mcs.anl.gov> a écrit :
>
>>
>>
>> > On Nov 15, 2018, at 1:02 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> >
>> > There is a lot of load imbalance in VecMAXPY also. The partitioning
>> could be bad and if not its the machine.
>>
>>
>> >
>> > On Thu, Nov 15, 2018 at 1:56 PM Smith, Barry F. via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> > Something is odd about your configuration. Just consider the time
>> for VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI
>> processes it produces
>> >
>> > Time
>>
>> flop rate
>> > VecMAXPY 575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1,600,021
>> >
>> > on 1500 processes it produces
>> >
>> > VecMAXPY 583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1,289,187
>> >
>> > that is it actually takes longer (the time goes from .84 seconds to
>> 1.08 seconds and the flop rate from 1,600,021 down to 1,289,187) You would
>> never expect this kind of behavior
>> >
>> > and on 2000 processes it produces
>> >
>> > VecMAXPY 583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1,955,563
>> >
>> > so it speeds up again but not by very much. This is very mysterious and
>> not what you would expect.
>> >
>> > I'm inclined to believe something is out of whack on your computer,
>> are you sure all nodes on the computer are equivalent? Same processors,
>> same clock speeds? What happens if you run the 1000 process case several
>> times, do you get very similar numbers for VecMAXPY()? You should but I am
>> guessing you may not.
>> >
>> > Barry
>> >
>> > Note that this performance issue doesn't really have anything to do
>> with the preconditioner you are using.
>> >
>> >
>> >
>> >
>> >
>> > > On Nov 15, 2018, at 10:50 AM, Karin&NiKo via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> > >
>> > > Dear PETSc team,
>> > >
>> > > I am solving a linear transient dynamic problem, based on a
>> discretization with finite elements. To do that, I am using FGMRES with
>> GAMG as a preconditioner. I consider here 10 time steps.
>> > > The problem has round to 118e6 dof and I am running on 1000, 1500 and
>> 2000 procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
>> > > I notice that the performance deteriorates when I increase the number
>> of processes.
>> > > You can find as attached file the log_view of the execution and the
>> detailled definition of the KSP.
>> > >
>> > > Is the problem too small to run on that number of processes or is
>> there something wrong with my use of GAMG?
>> > >
>> > > I thank you in advance for your help,
>> > > Nicolas
>> > >
>> <FGMRES_GAMG_1000procs.txt><FGMRES_GAMG_2000procs.txt><FGMRES_GAMG_1500procs.txt>
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181116/8f6608b9/attachment.html>
More information about the petsc-users
mailing list