[petsc-users] Scaling of PETSc example ex2f.F90

Tue Aug 16 12:06:22 CDT 2022

Your observed performance is explained by the number of iterations, which varies based on a number of factors, plus imperfect memory bandwidth scaling (https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup).

Problems like this (and harder) can be solved much faster using a method like -pc_type gamg or -pc_type hypre (if configured with --download-hypre). Also note that this example does an intentionally poor job of partitioning (so it's easy to understand what PETSc is doing) -- you should use a better partition and preallocation for a real problem.

Sidarth Narayanan <snarayanan1 at altair.com> writes:

> Hello PETSc team,
>
> I am currently using PETSc to improve the performance of a CFD solver by using it's parallel linear solver. While doing so I ran across some scaling issues where performance increase ( measured as KSPSolve time using MPI_Wtime() function )  in the linear algebra operation was only 2 times while using 4 processes. So I tried to check the scaling of the example problem ex2f.F90 provided as a part of the PETSc package and ran across a similar issue there. Here is the issue, If I run the problem on 1000 x 1000 2-D, five-point stencil with the exact solution set to 1.0, the scaling is perfect. But as soon as I change the exact solution to random values by using the flag -random_exact_sol (generally between 0 and 1), the scaling takes a  major hit. I have not changed the example code apart from increasing problem size and adding the time stamps before and after KSPSolve. Could you please let me know what I am missing (or) doing wrong here ? And I was also wondering if PETSc had an option to print out the pre-conditioned matrix as changing the exact solution changes the RHS and hence might change the pre-conditioned matrix. I have provided below the exact code and the results from the 4 runs including the compilation part.
>
> Source Code: (ex2f.F90 with time stamps and increased problem size)
> !
> !  Description: Solves a linear system in parallel with KSP (Fortran code).
> !               Also shows how to set a user-defined monitoring routine.
> !
> !
> !
> ! -----------------------------------------------------------------------
>
>       program main
> #include <petsc/finclude/petscksp.h>
>       use petscksp
>       implicit none
> !
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                   Variable declarations
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !
> !  Variables:
> !     ksp     - linear solver context
> !     ksp      - Krylov subspace method context
> !     pc       - preconditioner context
> !     x, b, u  - approx solution, right-hand-side, exact solution vectors
> !     A        - matrix that defines linear system
> !     its      - iterations for convergence
> !     norm     - norm of error in solution
> !     rctx     - random number generator context
> !
> !  Note that vectors are declared as PETSc "Vec" objects.  These vectors
> !  are mathematical objects that contain more than just an array of
> !  double precision numbers. I.e., vectors in PETSc are not just
> !        double precision x(*).
> !  However, local vector data can be easily accessed via VecGetArray().
> !  See the Fortran section of the PETSc users manual for details.
> !
>       PetscReal  norm,t1,t2
>       PetscInt  i,j,II,JJ,m,n,its
>       PetscInt  Istart,Iend,ione
>       PetscErrorCode ierr
>       PetscMPIInt     rank,size
>       PetscBool   flg
>       PetscScalar v,one,neg_one
>       Vec         x,b,u
>       Mat         A
>       KSP         ksp
>       PetscRandom rctx
>       PetscViewerAndFormat vf,vzero
>
> !  These variables are not currently used.
> !      PC          pc
> !      PCType      ptype
> !      PetscReal tol
>
> !  Note: Any user-defined Fortran routines (such as MyKSPMonitor)
> !  MUST be declared as external.
>
>       external MyKSPMonitor,MyKSPConverged
>
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                 Beginning of program
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
>       call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
>       if (ierr .ne. 0) then
>         print*,'Unable to initialize PETSc'
>         stop
>       endif
>       m = 1000
>       n = 1000
>       one  = 1.0
>       neg_one = -1.0
>       ione    = 1
>       call PetscOptionsGetInt(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-m',m,flg,ierr)
>       call PetscOptionsGetInt(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-n',n,flg,ierr)
>       call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>       call MPI_Comm_size(PETSC_COMM_WORLD,size,ierr)
>
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !      Compute the matrix and right-hand-side vector that define
> !      the linear system, Ax = b.
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> !  Create parallel matrix, specifying only its global dimensions.
> !  When using MatCreate(), the matrix format can be specified at
> !  runtime. Also, the parallel partitioning of the matrix is
> !  determined by PETSc at runtime.
>
>       call MatCreate(PETSC_COMM_WORLD,A,ierr)
>       call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n,ierr)
>       call MatSetFromOptions(A,ierr)
>       call MatSetUp(A,ierr)
>
> !  Currently, all PETSc parallel matrix formats are partitioned by
> !  contiguous chunks of rows across the processors.  Determine which
> !  rows of the matrix are locally owned.
>
>       call MatGetOwnershipRange(A,Istart,Iend,ierr)
>
> !  Set matrix elements for the 2-D, five-point stencil in parallel.
> !   - Each processor needs to insert only elements that it owns
> !     locally (but any non-local elements will be sent to the
> !     appropriate processor during matrix assembly).
> !   - Always specify global row and columns of matrix entries.
> !   - Note that MatSetValues() uses 0-based row and column numbers
> !     in Fortran as well as in C.
>
> !     Note: this uses the less common natural ordering that orders first
> !     all the unknowns for x = h then for x = 2h etc; Hence you see JH = II +- n
> !     instead of JJ = II +- m as you might expect. The more standard ordering
> !     would first do all variables for y = h, then y = 2h etc.
>
>       do 10, II=Istart,Iend-1
>         v = -1.0
>         i = II/n
>         j = II - i*n
>         if (i.gt.0) then
>           JJ = II - n
>           call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
>         endif
>         if (i.lt.m-1) then
>           JJ = II + n
>           call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
>         endif
>         if (j.gt.0) then
>           JJ = II - 1
>           call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
>         endif
>         if (j.lt.n-1) then
>           JJ = II + 1
>           call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
>         endif
>         v = 4.0
>         call  MatSetValues(A,ione,II,ione,II,v,INSERT_VALUES,ierr)
> 10   continue
>
> !  Assemble matrix, using the 2-step process:
> !       MatAssemblyBegin(), MatAssemblyEnd()
> !  Computations can be done while messages are in transition,
> !  by placing code between these two statements.
>
>       call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)
>       call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)
>
> !  Create parallel vectors.
> !   - Here, the parallel partitioning of the vector is determined by
> !     PETSc at runtime.  We could also specify the local dimensions
> !     if desired -- or use the more general routine VecCreate().
> !   - When solving a linear system, the vectors and matrices MUST
> !     be partitioned accordingly.  PETSc automatically generates
> !     appropriately partitioned matrices and vectors when MatCreate()
> !     and VecCreate() are used with the same communicator.
> !   - Note: We form 1 vector from scratch and then duplicate as needed.
>
>       call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr)
>       call VecSetFromOptions(u,ierr)
>       call VecDuplicate(u,b,ierr)
>       call VecDuplicate(b,x,ierr)
>
> !  Set exact solution; then compute right-hand-side vector.
> !  By default we use an exact solution of a vector with all
> !  elements of 1.0;  Alternatively, using the runtime option
> !  -random_sol forms a solution vector with random components.
>
>       call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr)
>       if (flg) then
>          call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr)
>          call PetscRandomSetFromOptions(rctx,ierr)
>          call VecSetRandom(u,rctx,ierr)
>          call PetscRandomDestroy(rctx,ierr)
>       else
>          call VecSet(u,one,ierr)
>       endif
>       call MatMult(A,u,b,ierr)
>
> !  View the exact solution vector if desired
>
>       call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-view_exact_sol',flg,ierr)
>       if (flg) then
>          call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
>       endif
>
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !         Create the linear solver and set various options
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> !  Create linear solver context
>
>       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>
> !  Set operators. Here the matrix that defines the linear system
> !  also serves as the preconditioning matrix.
>
>       call KSPSetOperators(ksp,A,A,ierr)
>
> !  Set linear solver defaults for this problem (optional).
> !   - By extracting the KSP and PC contexts from the KSP context,
> !     we can then directly directly call any KSP and PC routines
> !     to set various options.
> !   - The following four statements are optional; all of these
> !     parameters could alternatively be specified at runtime via
> !     KSPSetFromOptions(). All of these defaults can be
> !     overridden at runtime, as indicated below.
>
> !     We comment out this section of code since the Jacobi
> !     preconditioner is not a good general default.
>
> !      call KSPGetPC(ksp,pc,ierr)
> !      ptype = PCJACOBI
> !      call PCSetType(pc,ptype,ierr)
> !      tol = 1.e-7
> !      call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>
> !  Set user-defined monitoring routine if desired
>
>       call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr)
>       if (flg) then
>         vzero = 0
>         call KSPMonitorSet(ksp,MyKSPMonitor,vzero,PETSC_NULL_FUNCTION,ierr)
> !
> !     Also use the default KSP monitor routine showing how it may be used from Fortran
> !
>         call PetscViewerAndFormatCreate(PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,vf,ierr)
>         call KSPMonitorSet(ksp,KSPMonitorResidual,vf,PetscViewerAndFormatDestroy,ierr)
>       endif
>
> !  Set runtime options, e.g.,
> !      -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
> !  These options will override those specified above as long as
> !  KSPSetFromOptions() is called _after_ any other customization
> !  routines.
>
>       call KSPSetFromOptions(ksp,ierr)
>
> !  Set convergence test routine if desired
>
>       call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr)
>       if (flg) then
>         call KSPSetConvergenceTest(ksp,MyKSPConverged,0,PETSC_NULL_FUNCTION,ierr)
>       endif
> !
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                      Solve the linear system
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>                  t1 = MPI_Wtime()
>       call KSPSolve(ksp,b,x,ierr)
>                  t2 = MPI_Wtime()
>                  !x =A-1b
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !                     Check solution and clean up
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> !  Check the error
>       call VecAXPY(x,neg_one,u,ierr)
>       call VecNorm(x,NORM_2,norm,ierr)
>       call KSPGetIterationNumber(ksp,its,ierr)
>       if (rank .eq. 0) then
>         if (norm .gt. 1.e-12) then
>            write(6,100) norm,its
>         else
>            write(6,110) its
>         endif
>                               write(6,111) t2 - t1
>       endif
>   100 format('Norm of error ',e11.4,' iterations ',i5)
>   110 format('Norm of error < 1.e-12 iterations ',i5)
>   111 format('KSPSolve time: ',f10.3)
>
> !  Free work space.  All PETSc objects should be destroyed when they
> !  are no longer needed.
>
>       call KSPDestroy(ksp,ierr)
>       call VecDestroy(u,ierr)
>       call VecDestroy(x,ierr)
>       call VecDestroy(b,ierr)
>       call MatDestroy(A,ierr)
>
> !  Always call PetscFinalize() before exiting a program.  This routine
> !    - finalizes the PETSc libraries as well as MPI
> !    - provides summary and diagnostic information if certain runtime
> !      options are chosen (e.g., -log_view).  See PetscFinalize()
> !      manpage for more information.
>
>       call PetscFinalize(ierr)
>       end
>
> ! --------------------------------------------------------------
> !
> !  MyKSPMonitor - This is a user-defined routine for monitoring
> !  the KSP iterative solvers.
> !
> !  Input Parameters:
> !    ksp   - iterative context
> !    n     - iteration number
> !    rnorm - 2-norm (preconditioned) residual value (may be estimated)
> !    dummy - optional user-defined monitor context (unused here)
> !
>       subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)
>       use petscksp
>       implicit none
>
>       KSP              ksp
>       Vec              x
>       PetscErrorCode ierr
>       PetscInt n,dummy
>       PetscMPIInt rank
>       PetscReal rnorm
>
> !  Build the solution vector
>       call KSPBuildSolution(ksp,PETSC_NULL_VEC,x,ierr)
>
> !  Write the solution vector and residual norm to stdout
> !   - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
> !     handles data from multiple processors so that the
> !     output is not jumbled.
>
>       call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>       if (rank .eq. 0) write(6,100) n
>       call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
>       if (rank .eq. 0) write(6,200) n,rnorm
>
> 100  format('iteration ',i5,' solution vector:')
> 200  format('iteration ',i5,' residual norm ',e11.4)
>       ierr = 0
>       end
>
> ! --------------------------------------------------------------
> !
> !  MyKSPConverged - This is a user-defined routine for testing
> !  convergence of the KSP iterative solvers.
> !
> !  Input Parameters:
> !    ksp   - iterative context
> !    n     - iteration number
> !    rnorm - 2-norm (preconditioned) residual value (may be estimated)
> !    dummy - optional user-defined monitor context (unused here)
> !
>       subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)
>       use petscksp
>       implicit none
>
>       KSP              ksp
>       PetscErrorCode ierr
>       PetscInt n,dummy
>       KSPConvergedReason flag
>       PetscReal rnorm
>
>       if (rnorm .le. .05) then
>         flag = 1
>       else
>         flag = 0
>       endif
>       ierr = 0
>
>       end
>
> !/*TEST
> !
> !   test:
> !      nsize: 2
> !      args: -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
> !
> !   test:
> !      suffix: 2
> !      nsize: 2
> !      args: -pc_type jacobi -my_ksp_monitor -ksp_gmres_cgs_refinement_type refine_always
> !
> !TEST*/
>
> Output:
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ make ex2f
> /home/snarayanan1/petsc-release/lib/petsc/bin/win32fe/win32fe ifort -MT -O3 -fpp  -MT -O3 -fpp   -I/home/snarayanan1/petsc-release/include -I/home/snarayanan1/petsc-release/arch-ci-mswin-opt-impi/include -I/cygdrive/c/PROGRA~2/Intel/oneAPI/mpi/2021.5.0/include     ex2f.F90  -L/cygdrive/c/cygwin64/home/snarayanan1/petsc-release/arch-ci-mswin-opt-impi/lib -L/cygdrive/c/PROGRA~2/Intel/oneAPI/mkl/2022.0.0/lib/intel64 -lpetsc mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib /cygdrive/c/PROGRA~2/Intel/oneAPI/mpi/2021.5.0/lib/release/impi.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib -o ex2f
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 1 ex2f.exe
> Norm of error  0.1214E+02 iterations  2571
> KSPSolve time:    104.949
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 2 ex2f.exe
> Norm of error  0.1221E+02 iterations  1821
> KSPSolve time:     47.139
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 1 ex2f.exe -random_exact_sol
> Norm of error  0.8541E+02 iterations  1003
> KSPSolve time:     40.853
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 2 ex2f.exe -random_exact_sol
> Norm of error  0.8562E+02 iterations  1191
> KSPSolve time:     31.745
>
>
> Thank You,
> Sidarth Narayanan
> Electronics Thermal Management Intern
> Altair Engineering Inc.
> Troy, MI