[petsc-users] GAMG Parallel Performance

Smith, Barry F. bsmith at mcs.anl.gov
Thu Nov 15 12:56:05 CST 2018


    Something is odd about your configuration. Just consider the time for VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI processes it produces

                                                Time                                                                                                                            flop rate
 VecMAXPY             575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021

on 1500 processes it produces

 VecMAXPY             583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187

that is it actually takes longer (the time goes from .84 seconds to 1.08 seconds and the flop rate from 1,600,021 down to 1,289,187) You would never expect this kind of behavior

and on 2000 processes it produces

VecMAXPY             583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563

so it speeds up again but not by very much. This is very mysterious and not what you would expect.

   I'm inclined to believe something is out of whack on your computer, are you sure all nodes on the computer are equivalent? Same processors, same clock speeds? What happens if you run the 1000 process case several times, do you get very similar numbers for VecMAXPY()? You should but I am guessing you may not.

    Barry

  Note that this performance issue doesn't really have anything to do with the preconditioner you are using.





> On Nov 15, 2018, at 10:50 AM, Karin&NiKo via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Dear PETSc team,
> 
> I am solving a linear transient dynamic problem, based on a discretization with finite elements. To do that, I am using FGMRES with GAMG as a preconditioner. I consider here 10 time steps. 
> The problem has round to 118e6 dof and I am running on 1000, 1500 and 2000 procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> I notice that the performance deteriorates when I increase the number of processes. 
> You can find as attached file the log_view of the execution and the detailled definition of the KSP.
> 
> Is the problem too small to run on that number of processes or is there something wrong with my use of GAMG?
> 
> I thank you in advance for your help,
> Nicolas
> <FGMRES_GAMG_1000procs.txt><FGMRES_GAMG_2000procs.txt><FGMRES_GAMG_1500procs.txt>



More information about the petsc-users mailing list