[petsc-users] GAMG Parallel Performance

Thu Nov 15 17:24:42 CST 2018

Ok. I will do that soon and I will let you know.
Thanks again,
Nicolas

Le jeu. 15 nov. 2018 20:50, Smith, Barry F. <bsmith at mcs.anl.gov> a écrit :

>
>
> > On Nov 15, 2018, at 1:02 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > There is a lot of load imbalance in VecMAXPY also. The partitioning
> could be bad and if not its the machine.
>
>
> >
> > On Thu, Nov 15, 2018 at 1:56 PM Smith, Barry F. via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> >     Something is odd about your configuration. Just consider the time
> for VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI
> processes it produces
> >
> >                                                 Time
>
>                             flop rate
> >  VecMAXPY             575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021
> >
> > on 1500 processes it produces
> >
> >  VecMAXPY             583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187
> >
> > that is it actually takes longer (the time goes from .84 seconds to 1.08
> seconds and the flop rate from 1,600,021 down to 1,289,187) You would never
> expect this kind of behavior
> >
> > and on 2000 processes it produces
> >
> > VecMAXPY             583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563
> >
> > so it speeds up again but not by very much. This is very mysterious and
> not what you would expect.
> >
> >    I'm inclined to believe something is out of whack on your computer,
> are you sure all nodes on the computer are equivalent? Same processors,
> same clock speeds? What happens if you run the 1000 process case several
> times, do you get very similar numbers for VecMAXPY()? You should but I am
> guessing you may not.
> >
> >     Barry
> >
> >   Note that this performance issue doesn't really have anything to do
> with the preconditioner you are using.
> >
> >
> >
> >
> >
> > > On Nov 15, 2018, at 10:50 AM, Karin&NiKo via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > >
> > > Dear PETSc team,
> > >
> > > I am solving a linear transient dynamic problem, based on a
> discretization with finite elements. To do that, I am using FGMRES with
> GAMG as a preconditioner. I consider here 10 time steps.
> > > The problem has round to 118e6 dof and I am running on 1000, 1500 and
> 2000 procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> > > I notice that the performance deteriorates when I increase the number
> of processes.
> > > You can find as attached file the log_view of the execution and the
> detailled definition of the KSP.
> > >
> > > Is the problem too small to run on that number of processes or is
> there something wrong with my use of GAMG?
> > >
> > > I thank you in advance for your help,
> > > Nicolas
> > >
> <FGMRES_GAMG_1000procs.txt><FGMRES_GAMG_2000procs.txt><FGMRES_GAMG_1500procs.txt>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181116/4030b8cb/attachment-0001.html>