[petsc-users] GAMG Parallel Performance

Mark Adams mfadams at lbl.gov
Thu Nov 15 13:02:40 CST 2018


There is a lot of load imbalance in VecMAXPY also. The partitioning could
be bad and if not its the machine.

On Thu, Nov 15, 2018 at 1:56 PM Smith, Barry F. via petsc-users <
petsc-users at mcs.anl.gov> wrote:

>
>     Something is odd about your configuration. Just consider the time for
> VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI
> processes it produces
>
>                                                 Time
>
>                           flop rate
>  VecMAXPY             575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021
>
> on 1500 processes it produces
>
>  VecMAXPY             583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187
>
> that is it actually takes longer (the time goes from .84 seconds to 1.08
> seconds and the flop rate from 1,600,021 down to 1,289,187) You would never
> expect this kind of behavior
>
> and on 2000 processes it produces
>
> VecMAXPY             583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563
>
> so it speeds up again but not by very much. This is very mysterious and
> not what you would expect.
>
>    I'm inclined to believe something is out of whack on your computer, are
> you sure all nodes on the computer are equivalent? Same processors, same
> clock speeds? What happens if you run the 1000 process case several times,
> do you get very similar numbers for VecMAXPY()? You should but I am
> guessing you may not.
>
>     Barry
>
>   Note that this performance issue doesn't really have anything to do with
> the preconditioner you are using.
>
>
>
>
>
> > On Nov 15, 2018, at 10:50 AM, Karin&NiKo via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > Dear PETSc team,
> >
> > I am solving a linear transient dynamic problem, based on a
> discretization with finite elements. To do that, I am using FGMRES with
> GAMG as a preconditioner. I consider here 10 time steps.
> > The problem has round to 118e6 dof and I am running on 1000, 1500 and
> 2000 procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> > I notice that the performance deteriorates when I increase the number of
> processes.
> > You can find as attached file the log_view of the execution and the
> detailled definition of the KSP.
> >
> > Is the problem too small to run on that number of processes or is there
> something wrong with my use of GAMG?
> >
> > I thank you in advance for your help,
> > Nicolas
> >
> <FGMRES_GAMG_1000procs.txt><FGMRES_GAMG_2000procs.txt><FGMRES_GAMG_1500procs.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181115/df7e4122/attachment.html>


More information about the petsc-users mailing list