[petsc-users] I find slow performance of SNES

Tue Sep 17 19:15:11 CDT 2019

> On Sep 17, 2019, at 11:57 AM, Pedro Gonzalez <pedro.gonzalez at u-bordeaux.fr> wrote:
> 
> Dear Barry,
> 
> 
> Thanks a lot for your quick reply.
> 
> 
> 
>> Do you simply mean that -snes_type ngs is not converging to the solution? Does it seem to converge to something else or nothing at all?
>> The SNES GS code just calls the user provided routine set with SNESSetNGS(). This means that it is expect to behave the same way as if the user simply called the user provided routine themselves.  I assume you have a routine that implements the GS on the DMDA since you write " I managed to parallelize it by using DMDA with very good results. " Thus you should get the same iterations for both calling your Gauss-Seidel code yourself and calling SNESSetNGS() and then calling SNESSolve() with -snes_type ngs.  You can check if they are behaving in the same (for simplicity) by using the debugger or by putting VecView() into code to see if they are generating the same values. Just have your GS code call VecView() on the input vectors at the top and the output vectors at the bottom.
> 
> Before adding SNES to my code, I had programmed a nonlinear Gauss-Seidel algorithm which I parallelized via DMDA. I use it as reference solution. The SNES solver converges towards this solution. The problem is that it is being considerably slower than my reference algorithm. For instance, "-snes_type ngs" takes about 200 times more time, "-snes_type nrichardson" employs approximately 50 times more time, while "-snes_type ngmres" is about 20 times slower.
> 
> I expect that SNES be as fast as my reference algorithm.

  Why? It is using different algorithms that are not going to necessarily converge any faster in iterations than your GS nor take less time per iteration.

> This slow-performance problem may be due to a not optimal configuration of SNES (I am not an expert user of PETSc) or, as you suggest (see below), to the fact that SNES is calculating more than the necessary at each iteration (e.g., norms).
> 
> Moreover, I have checked that the number iterations needed by "ngs" to reach convergence is almost the same as my reference GS algorithm. The small differences are due to the way the grid points are sweept during the iterations.
> 
> 
> 
>> Do you mean both   -snes_ngs_secant and  -snes_ngs_secant_h  ?  The second option by itself will do nothing.  Cut and paste the exact options you use.
> 
> In order to make SNES converge, I use the following options:
> - Nonlinear Gauss-Seidel: -snes_type ngs  -snes_ngs_secant_h 1.E-1
> - Nonlinear Richardson: -snes_type nrichardson
> - Nolinear Generalized Minimal RESidual: -snes_type ngmres  -snes_linesearch_type cp
> 
> 
> 
>> This would happen if you did not use -snes_ngs_secant  but do use -snes_ngs_secant_h but this doesn't change the algorithm

  I see. You don't provide your GS solver to SNES so it uses the secant method by default.

> 
> With respect to "ngs" solver, it seems that there is a problem with the residual. Physically, the solution of the nonlinear system is proportional to an input amplitude and the convergence criterium which I set is normalized to this input amplitude. Hence, if I just vary the amplitude in a given problem, the solver should make the same number of iterations because the residual scales to that input amplitude. For physically small input amplitudes from 1.E0 to 1.E8 I observed that the option "-snes_type ngs" makes SNES converge (within the same time and the same number of iterations) and the absolute residual were proportional to the input amplitude, as expected. However, my imput amplitude is 1.E10 and I observed that "-snes_type ngs" could not make the absolute residual decrease. That is the reason why I added the option "-snes_ngs_secant_h 1.E-1" to "ngs" in order to converge when the input amplitude is 1.E10. However, I realize that I can vary this parameter from 1.E-1 to 1.E50 and I obtain the same solution within the same computation time. For very small values (I tested -snes_ngs_secant_h 1.E-8), it does not converge.

I cannot explain why using huge values would work for h.  I am not sure what you mean by "For physically small input amplitudes from 1.E0 to 1.E8 I", do you mean your solution is in that range? And does "my imput amplitude is 1.E10" mean the solution is about 10^10. If so then that makes sense, because h is the perturbation to x and if the perturbation is too small then it doesn't change the solution so you get no secant computed. Thus you need a bigger h

> What is happening here? Nonlinear Richardson does not have this problem and converges at any input amplitude.

Nonlinear Richardson does not use any differencing scheme. It has no h. 

> 
> 
> 
>> By default SNES may be doing some norm computations at each iteration which are expensive and will slow things down. You can use SNESSetNormSchedule() or the command line form -snes_norm_schedule <none, always, initialonly, finalonly, initalfinalonly>  to turn these norms off. 
> 
> Thanks for this pertinent suggestion. Actually I have my own convergence criterium. In the subroutine MySNESConverged(snes,nc,xnorm,snorm,fnorm,reason,ctx,ierr) called by SNESSetConvergenceTest(snes,MySNESConverged,0,PETSC_NULL_FUNCTION,ierr), I do not need the norms "xnorm", "snorm" and "fnorm" which SNES calculated by default, as you say. However, I have done a print of these values and I see that "xnorm" and "snorm" are always 0 (so I deduce that they are not calculated) and that "fnorm" is calculated even when I add the option "-snes_norm_schedule none".

   Some algorithms such as NGMRES require the norms.  Basic nonlinear GS should not. Here is the code: Note the norm is only called for all iterations depending on norm schedule. If you are sure it is calling the norm every iteration you could run in the debugger with a break point in VecNorm to see where it is being called.

  for (i = 0; i < snes->max_its; i++) {
    ierr = SNESComputeNGS(snes, B, X);CHKERRQ(ierr);
    /* only compute norms if requested or about to exit due to maximum iterations */
    if (normschedule == SNES_NORM_ALWAYS || ((i == snes->max_its - 1) && (normschedule == SNES_NORM_INITIAL_FINAL_ONLY || normschedule == SNES_NORM_FINAL_ONLY))) {
      ierr = SNESComputeFunction(snes,X,F);CHKERRQ(ierr);
      ierr = VecNorm(F, NORM_2, &fnorm);CHKERRQ(ierr); /* fnorm <- ||F||  */
      SNESCheckFunctionNorm(snes,fnorm);
      /* Monitor convergence */
      ierr       = PetscObjectSAWsTakeAccess((PetscObject)snes);CHKERRQ(ierr);
      snes->iter = i+1;
      snes->norm = fnorm;
      ierr       = PetscObjectSAWsGrantAccess((PetscObject)snes);CHKERRQ(ierr);
      ierr       = SNESLogConvergenceHistory(snes,snes->norm,0);CHKERRQ(ierr);
      ierr       = SNESMonitor(snes,snes->iter,snes->norm);CHKERRQ(ierr);
    }
    /* Test for convergence */
    if (normschedule == SNES_NORM_ALWAYS) ierr = (*snes->ops->converged)(snes,snes->iter,0.0,0.0,fnorm,&snes->reason,snes->cnvP);CHKERRQ(ierr);
    if (snes->reason) PetscFunctionReturn(0);
    /* Call general purpose update function */
    if (snes->ops->update) {
      ierr = (*snes->ops->update)(snes, snes->iter);CHKERRQ(ierr);
    }
  }

> If "-snes_norm_schedule none" is not working, how can I switch off the calculation of "fnorm"? Below you can see how I integrated SNES into my code.
> 
> Moreover, I have several degrees of freedom at each point of DMDA grid (dof > 1). The norm which I need to calculate for my convergence criterium only concerns a subset of degrees of freedom: start = [start1, start2, ..., startn], with len(start) = n < dof. I am using VecStrideNorm() or VecStrideMax() with a do-loop:
> 
>      call VecStrideNorm(solution,start(1),NORM_INFINITY,mynorm,ierr) 
>      do i = 2, n
>         call VecStrideNorm(solution,start(i),NORM_INFINITY,mynorm_i,ierr)
>         mynorm = MAX(mynorm, mynorm_i)
>      end do
> 
> Does there exist a more efficient way of calculating the norm I need?

  Yes, the above will requires a n global reductions and n loops of the data; so pretty inefficient. Better you write the loops yourself and use MPI_Allreduce() directly to accumulate the 
parallel maximum. 

Below you are not providing a Gauss-Seidel method so it if you pick ngs the only thing it can do is the default coloring. But you can pass in your GS with SNESSetNGS() and SNES will use it, I thought that was what you were doing.

This is what I think you should do.

1) use SNESSetNGS() to provide PETSc with your GS. Make sure it converges at the same rate as your standalone solver and roughly the same time.

2) use NGMRES to accelerate the convergence of the GS. This can also be phrased as use your Gauss-Seidel as a nonlinear preconditioner for nonlinear GMRES.

  Basically you create the SNES and then call SNESGetNPC() on it then provide to this "inner" SNES your GS smoother

> 
> 
> 
> The PETSc part which I integrated in my code is the following (pseudo-code):
> 
> !-----------------------------------------------------------------------------!
> DM                 :: dmda
> Vec                :: lv, gv   
> SNES               :: snes   
> PetscReal, pointer :: p
> !-----------------------------------------------------------------------------!
> external FormFunctionLocal
> external MySNESConverged
> !-----------------------------------------------------------------------------!
> call PetscInitialize(...)
> call MPI_Comm_rank(...)
> call MPI_Comm_size(...)
> 
> call DMDACreate3d(... DM_BOUNDARY_GHOSTED ... DMDA_STENCIL_STAR ... dof ...)
> call DMSetUp(dmda,ierr)
> call DMDAGetCorners(...) ===> I obtain xs, xe, ys, ye, zs, ze in Fortan 1-based indexing
> call DMDAGetGhostCorners(...) ===> I obtain xsg, xeg, ysg, yeg, zsg, zeg in Fortan 1-based indexing
> call DMCreateLocalVector(dmda,lv,ierr)
> call DMCreateGlobalVector(dmda,gv,ierr)
> 
> call SNESCreate(comm,snes,ierr)
> call SNESSetConvergenceTest(snes,MySNESConverged,0,PETSC_NULL_FUNCTION,ierr)
> call SNESSetDM(snes,dmda,ierr)
> call DMDASNESSetFunctionLocal(dmda,INSERT_VALUES,FormFunctionLocal,0,ierr)
> call SNESSetFromOptions(snes,ierr)
> 
> nullify(p)
> call DMDAVecGetArrayF90(dmda,lv,p,ierr)
> do n = zsg, zeg
>   do m = ysg, yeg
>      do l = xsg, xeg
>         do i = 1, dof
>            p(i-1,l-1,m-1,n-1) = INITIAL VALUE
>         end do
>      end do
>   end do
> end do
> call DMDAVecRestoreArrayF90(dmda,lv,p,ierr)
> nullify(p)
> call DMLocalToGlobalBegin(dmda,lv,INSERT_VALUES,gv,ierr)
> call DMLocalToGlobalEnd(dmda,lv,INSERT_VALUES,gv,ierr)
> 
> call SNESSolve(snes,PETSC_NULL_VEC,gv,ierr)
> 
> call VecDestroy(lv,ierr)
> call VecDestroy(gv,ierr)
> call SNESDestroy(snes,ierr)
> call DMDestroy(dmda,ierr)
> call PetscFinalize(ierr)
> !-----------------------------------------------------------------------------!
> subroutine FormFunctionLocal(info,x,G,da,ierr)
>   DMDALocalInfo,  intent(in)    :: info(DMDA_LOCAL_INFO_SIZE)
>   PetscScalar,    intent(in)    :: x(0:petscdof-1,xs-stw:xe+stw,ys-stw:ye+stw,zs-stw:ze+stw)
>   PetscScalar,    intent(inout) :: G(0:petscdof-1,xs:xe,ys:ye,zs:ze)
>   DM,             intent(in)    :: da
>   PetscErrorCode, intent(inout) :: ierr
>   do n = zs, ze
>      do m = ys, ye
>         do l = xs, xe
>            do i = 1, dof
>               G(i,l,m,n) = VALUE OF THE FUNCTION G(x)
>            end do 
>         end do
>      end do
>   end do 
> end subroutine FormFunctionLocal
> !-----------------------------------------------------------------------------!
> subroutine MySNESConverged(snes,nc,xnorm,snorm,fnorm,reason,ctx,ierr)
>      SNES,                intent(in)    :: snes
>      PetscInt,            intent(in)    :: nc
>      PetscReal,           intent(in)    :: xnorm, snorm, fnorm
>      SNESConvergedReason, intent(inout) :: reason
>      PetscInt,            intent(in)    :: ctx
>      PetscErrorCode,      intent(inout) :: ierr
>      (...)
> end subroutine MySNESConverged
> !-----------------------------------------------------------------------------!
> 
> 
> 
> 
> 
> 
> Thanks a lot in advance for your reply.
> 
> 
> Best regards,
> Pedro
> 
> 
> 
> 
> 
> 
> 
> ----- Mail original -----
> De: "Smith, Barry F." <bsmith at mcs.anl.gov>
> À: "Pedro Gonzalez" <pedro.gonzalez at u-bordeaux.fr>
> Cc: "petsc-users" <petsc-users at mcs.anl.gov>
> Envoyé: Lundi 16 Septembre 2019 01:28:15
> Objet: Re: [petsc-users] I find slow performance of SNES
> 
>> On Sep 15, 2019, at 5:35 PM, Pedro Gonzalez via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> 
>> Dear all,
>> 
>> I am working on a code that solves a nonlinear system of equations G(x)=0 with Gauss-Seidel method. I managed to parallelize it by using DMDA with very good results. The previous week I changed my Gauss-Seidel solver by SNES. The code using SNES gives the same result as before, but I do not obtain the performance that I expected:
>> 1) When using the Gauss-Seidel method (-snes_type ngs) the residual G(x) seems not be scallable to the amplitude of x
> 
>   Do you simply mean that -snes_type ngs is not converging to the solution? Does it seem to converge to something else or nothing at all? 
> 
>   The SNES GS code just calls the user provided routine set with SNESSetNGS(). This means that it is expect to behave the same way as if the user simply called the user provided routine themselves.  I assume you have a routine that implements the GS on the DMDA since you write " I managed to parallelize it by using DMDA with very good results. " Thus you should get the same iterations for both calling your Gauss-Seidel code yourself and calling SNESSetNGS() and then calling SNESSolve() with -snes_type ngs.  You can check if they are behaving in the same (for simplicity) by using the debugger or by putting VecView() into code to see if they are generating the same values. Just have your GS code call VecView() on the input vectors at the top and the output vectors at the bottom.
> 
> 
>> and I have to add the option -snes_secant_h in order to make SNES converge.
> 
>  Do you mean both   -snes_ngs_secant and  -snes_ngs_secant_h  ?  The second option by itself will do nothing.  Cut and paste the exact options you use.
> 
>> However, I varied the step from 1.E-1 to 1.E50 and obtained the same result within the same computation time.
> 
>   This would happen if you did not use -snes_ngs_secant  but do use -snes_ngs_secant_h but this doesn't change the algorithm
> 
>> Is it normal that snes_secant_h can vary so many orders of magnitude?
> 
>   That certainly does seem odd. Looking at the code SNESComputeNGSDefaultSecant() we see it is perturbing the input vector (by color) with h
> 
> -    for (j=0;j<size;j++) {
> -      wa[idx[j]-s] += h;
> -    }
> 
> It then does the secant step with 
> 
>        if (PetscAbsScalar(g-f) > atol) {
>          /* This is equivalent to d = x - (h*f) / PetscRealPart(g-f) */
>          d = (x*g-w*f) / PetscRealPart(g-f);
>        } else {
>          d = x;
>        }
> 
> In PetscErrorCode SNESSetFromOptions_NGS(PetscOptionItems *PetscOptionsObject,SNES snes) one can see that the user provided h is accepted
> 
> ierr = PetscOptionsReal("-snes_ngs_secant_h","Differencing parameter for secant search","",gs->h,&gs->h,NULL);CHKERRQ(ierr);
> 
> You could run in the debugger with a break point in SNESComputeNGSDefaultSecant() to see if it is truly using the h you provided.
> 
> 
> 
> 
>> 2) Compared to my Gauss-Seidel algorithm, SNES does (approximately) the same number of iterations (with the same convergence criterium) but it is about 100 times slower.
> 
>   I don't fully understand what you are running so cannot completely answer this.
> 
> -snes_ngs_secant   will be lots slower than using a user provided GS
> 
> By default SNES may be doing some norm computations at each iteration which are expensive and will slow things down. You can use SNESSetNormSchedule() or the command line form -snes_norm_schedule <none, always, initialonly, finalonly, initalfinalonly>  to turn these norms off. 
> 
> 
>> What can be the reason(s) of this slow performance of SNES solver? I do not use preconditioner with my algorithm so I did not add one to SNES.
>> 
>> The main PETSc subroutines that I have included (in this order) are the following:
>> call DMDACreate3D
>> call DMSetUp
>> call DMCreateLocalVector
>> call DMCreateGlobalVector
>> call SNESCreate
>> call SNESSetConvergenceTest
>> call SNESSetDM
>> call DMDASNESSetFunctionLocal
>> call SNESSetFromOptions
>> call SNESSolve
>> 
>> Thanks in advance for you help.
>> 
>> Best regards,
>> Pedro
>> 
>>