[petsc-users] I find slow performance of SNES

Tue Sep 17 11:57:11 CDT 2019

Dear Barry,

Thanks a lot for your quick reply.

> Do you simply mean that -snes_type ngs is not converging to the solution? Does it seem to converge to something else or nothing at all?
> The SNES GS code just calls the user provided routine set with SNESSetNGS(). This means that it is expect to behave the same way as if the user simply called the user provided routine themselves.  I assume you have a routine that implements the GS on the DMDA since you write " I managed to parallelize it by using DMDA with very good results. " Thus you should get the same iterations for both calling your Gauss-Seidel code yourself and calling SNESSetNGS() and then calling SNESSolve() with -snes_type ngs.  You can check if they are behaving in the same (for simplicity) by using the debugger or by putting VecView() into code to see if they are generating the same values. Just have your GS code call VecView() on the input vectors at the top and the output vectors at the bottom.

Before adding SNES to my code, I had programmed a nonlinear Gauss-Seidel algorithm which I parallelized via DMDA. I use it as reference solution. The SNES solver converges towards this solution. The problem is that it is being considerably slower than my reference algorithm. For instance, "-snes_type ngs" takes about 200 times more time, "-snes_type nrichardson" employs approximately 50 times more time, while "-snes_type ngmres" is about 20 times slower.

I expect that SNES be as fast as my reference algorithm. This slow-performance problem may be due to a not optimal configuration of SNES (I am not an expert user of PETSc) or, as you suggest (see below), to the fact that SNES is calculating more than the necessary at each iteration (e.g., norms).

Moreover, I have checked that the number iterations needed by "ngs" to reach convergence is almost the same as my reference GS algorithm. The small differences are due to the way the grid points are sweept during the iterations.

> Do you mean both   -snes_ngs_secant and  -snes_ngs_secant_h  ?  The second option by itself will do nothing.  Cut and paste the exact options you use.

In order to make SNES converge, I use the following options:
- Nonlinear Gauss-Seidel: -snes_type ngs  -snes_ngs_secant_h 1.E-1
- Nonlinear Richardson: -snes_type nrichardson
- Nolinear Generalized Minimal RESidual: -snes_type ngmres  -snes_linesearch_type cp

> This would happen if you did not use -snes_ngs_secant  but do use -snes_ngs_secant_h but this doesn't change the algorithm

With respect to "ngs" solver, it seems that there is a problem with the residual. Physically, the solution of the nonlinear system is proportional to an input amplitude and the convergence criterium which I set is normalized to this input amplitude. Hence, if I just vary the amplitude in a given problem, the solver should make the same number of iterations because the residual scales to that input amplitude. For physically small input amplitudes from 1.E0 to 1.E8 I observed that the option "-snes_type ngs" makes SNES converge (within the same time and the same number of iterations) and the absolute residual were proportional to the input amplitude, as expected. However, my imput amplitude is 1.E10 and I observed that "-snes_type ngs" could not make the absolute residual decrease. That is the reason why I added the option "-snes_ngs_secant_h 1.E-1" to "ngs" in order to converge when the input amplitude is 1.E10. However, I realize that I can vary this parameter from 1.E-1 to 1.E50 and I obtain the same solution within the same computation time. For very small values (I tested -snes_ngs_secant_h 1.E-8), it does not converge. What is happening here? Nonlinear Richardson does not have this problem and converges at any input amplitude.

> By default SNES may be doing some norm computations at each iteration which are expensive and will slow things down. You can use SNESSetNormSchedule() or the command line form -snes_norm_schedule <none, always, initialonly, finalonly, initalfinalonly>  to turn these norms off. 

Thanks for this pertinent suggestion. Actually I have my own convergence criterium. In the subroutine MySNESConverged(snes,nc,xnorm,snorm,fnorm,reason,ctx,ierr) called by SNESSetConvergenceTest(snes,MySNESConverged,0,PETSC_NULL_FUNCTION,ierr), I do not need the norms "xnorm", "snorm" and "fnorm" which SNES calculated by default, as you say. However, I have done a print of these values and I see that "xnorm" and "snorm" are always 0 (so I deduce that they are not calculated) and that "fnorm" is calculated even when I add the option "-snes_norm_schedule none". If "-snes_norm_schedule none" is not working, how can I switch off the calculation of "fnorm"? Below you can see how I integrated SNES into my code.

Moreover, I have several degrees of freedom at each point of DMDA grid (dof > 1). The norm which I need to calculate for my convergence criterium only concerns a subset of degrees of freedom: start = [start1, start2, ..., startn], with len(start) = n < dof. I am using VecStrideNorm() or VecStrideMax() with a do-loop:

      call VecStrideNorm(solution,start(1),NORM_INFINITY,mynorm,ierr) 
      do i = 2, n
         call VecStrideNorm(solution,start(i),NORM_INFINITY,mynorm_i,ierr)
         mynorm = MAX(mynorm, mynorm_i)
      end do

Does there exist a more efficient way of calculating the norm I need?

The PETSc part which I integrated in my code is the following (pseudo-code):

!-----------------------------------------------------------------------------!
DM                 :: dmda
Vec                :: lv, gv   
SNES               :: snes   
PetscReal, pointer :: p
!-----------------------------------------------------------------------------!
external FormFunctionLocal
external MySNESConverged
!-----------------------------------------------------------------------------!
call PetscInitialize(...)
call MPI_Comm_rank(...)
call MPI_Comm_size(...)

call DMDACreate3d(... DM_BOUNDARY_GHOSTED ... DMDA_STENCIL_STAR ... dof ...)
call DMSetUp(dmda,ierr)
call DMDAGetCorners(...) ===> I obtain xs, xe, ys, ye, zs, ze in Fortan 1-based indexing
call DMDAGetGhostCorners(...) ===> I obtain xsg, xeg, ysg, yeg, zsg, zeg in Fortan 1-based indexing
call DMCreateLocalVector(dmda,lv,ierr)
call DMCreateGlobalVector(dmda,gv,ierr)

call SNESCreate(comm,snes,ierr)
call SNESSetConvergenceTest(snes,MySNESConverged,0,PETSC_NULL_FUNCTION,ierr)
call SNESSetDM(snes,dmda,ierr)
call DMDASNESSetFunctionLocal(dmda,INSERT_VALUES,FormFunctionLocal,0,ierr)
call SNESSetFromOptions(snes,ierr)

nullify(p)
call DMDAVecGetArrayF90(dmda,lv,p,ierr)
do n = zsg, zeg
   do m = ysg, yeg
      do l = xsg, xeg
         do i = 1, dof
            p(i-1,l-1,m-1,n-1) = INITIAL VALUE
         end do
      end do
   end do
end do
call DMDAVecRestoreArrayF90(dmda,lv,p,ierr)
nullify(p)
call DMLocalToGlobalBegin(dmda,lv,INSERT_VALUES,gv,ierr)
call DMLocalToGlobalEnd(dmda,lv,INSERT_VALUES,gv,ierr)

call SNESSolve(snes,PETSC_NULL_VEC,gv,ierr)

call VecDestroy(lv,ierr)
call VecDestroy(gv,ierr)
call SNESDestroy(snes,ierr)
call DMDestroy(dmda,ierr)
call PetscFinalize(ierr)
!-----------------------------------------------------------------------------!
subroutine FormFunctionLocal(info,x,G,da,ierr)
   DMDALocalInfo,  intent(in)    :: info(DMDA_LOCAL_INFO_SIZE)
   PetscScalar,    intent(in)    :: x(0:petscdof-1,xs-stw:xe+stw,ys-stw:ye+stw,zs-stw:ze+stw)
   PetscScalar,    intent(inout) :: G(0:petscdof-1,xs:xe,ys:ye,zs:ze)
   DM,             intent(in)    :: da
   PetscErrorCode, intent(inout) :: ierr
   do n = zs, ze
      do m = ys, ye
         do l = xs, xe
            do i = 1, dof
               G(i,l,m,n) = VALUE OF THE FUNCTION G(x)
            end do 
         end do
      end do
   end do 
end subroutine FormFunctionLocal
!-----------------------------------------------------------------------------!
subroutine MySNESConverged(snes,nc,xnorm,snorm,fnorm,reason,ctx,ierr)
      SNES,                intent(in)    :: snes
      PetscInt,            intent(in)    :: nc
      PetscReal,           intent(in)    :: xnorm, snorm, fnorm
      SNESConvergedReason, intent(inout) :: reason
      PetscInt,            intent(in)    :: ctx
      PetscErrorCode,      intent(inout) :: ierr
      (...)
end subroutine MySNESConverged
!-----------------------------------------------------------------------------!

Thanks a lot in advance for your reply.

Best regards,
Pedro

----- Mail original -----
De: "Smith, Barry F." <bsmith at mcs.anl.gov>
À: "Pedro Gonzalez" <pedro.gonzalez at u-bordeaux.fr>
Cc: "petsc-users" <petsc-users at mcs.anl.gov>
Envoyé: Lundi 16 Septembre 2019 01:28:15
Objet: Re: [petsc-users] I find slow performance of SNES

> On Sep 15, 2019, at 5:35 PM, Pedro Gonzalez via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Dear all,
> 
> I am working on a code that solves a nonlinear system of equations G(x)=0 with Gauss-Seidel method. I managed to parallelize it by using DMDA with very good results. The previous week I changed my Gauss-Seidel solver by SNES. The code using SNES gives the same result as before, but I do not obtain the performance that I expected:
> 1) When using the Gauss-Seidel method (-snes_type ngs) the residual G(x) seems not be scallable to the amplitude of x

   Do you simply mean that -snes_type ngs is not converging to the solution? Does it seem to converge to something else or nothing at all? 

   The SNES GS code just calls the user provided routine set with SNESSetNGS(). This means that it is expect to behave the same way as if the user simply called the user provided routine themselves.  I assume you have a routine that implements the GS on the DMDA since you write " I managed to parallelize it by using DMDA with very good results. " Thus you should get the same iterations for both calling your Gauss-Seidel code yourself and calling SNESSetNGS() and then calling SNESSolve() with -snes_type ngs.  You can check if they are behaving in the same (for simplicity) by using the debugger or by putting VecView() into code to see if they are generating the same values. Just have your GS code call VecView() on the input vectors at the top and the output vectors at the bottom.

> and I have to add the option -snes_secant_h in order to make SNES converge.

  Do you mean both   -snes_ngs_secant and  -snes_ngs_secant_h  ?  The second option by itself will do nothing.  Cut and paste the exact options you use.

> However, I varied the step from 1.E-1 to 1.E50 and obtained the same result within the same computation time.

   This would happen if you did not use -snes_ngs_secant  but do use -snes_ngs_secant_h but this doesn't change the algorithm

> Is it normal that snes_secant_h can vary so many orders of magnitude?

   That certainly does seem odd. Looking at the code SNESComputeNGSDefaultSecant() we see it is perturbing the input vector (by color) with h

-    for (j=0;j<size;j++) {
-      wa[idx[j]-s] += h;
-    }

It then does the secant step with 

        if (PetscAbsScalar(g-f) > atol) {
          /* This is equivalent to d = x - (h*f) / PetscRealPart(g-f) */
          d = (x*g-w*f) / PetscRealPart(g-f);
        } else {
          d = x;
        }

In PetscErrorCode SNESSetFromOptions_NGS(PetscOptionItems *PetscOptionsObject,SNES snes) one can see that the user provided h is accepted

ierr = PetscOptionsReal("-snes_ngs_secant_h","Differencing parameter for secant search","",gs->h,&gs->h,NULL);CHKERRQ(ierr);

You could run in the debugger with a break point in SNESComputeNGSDefaultSecant() to see if it is truly using the h you provided.

> 2) Compared to my Gauss-Seidel algorithm, SNES does (approximately) the same number of iterations (with the same convergence criterium) but it is about 100 times slower.

   I don't fully understand what you are running so cannot completely answer this.

-snes_ngs_secant   will be lots slower than using a user provided GS

By default SNES may be doing some norm computations at each iteration which are expensive and will slow things down. You can use SNESSetNormSchedule() or the command line form -snes_norm_schedule <none, always, initialonly, finalonly, initalfinalonly>  to turn these norms off. 

>  What can be the reason(s) of this slow performance of SNES solver? I do not use preconditioner with my algorithm so I did not add one to SNES.
> 
> The main PETSc subroutines that I have included (in this order) are the following:
> call DMDACreate3D
> call DMSetUp
> call DMCreateLocalVector
> call DMCreateGlobalVector
> call SNESCreate
> call SNESSetConvergenceTest
> call SNESSetDM
> call DMDASNESSetFunctionLocal
> call SNESSetFromOptions
> call SNESSolve
> 
> Thanks in advance for you help.
> 
> Best regards,
> Pedro
> 
>