[petsc-users] Scaling of PETSc example ex2f.F90
Jed Brown
jed at jedbrown.org
Tue Aug 16 12:06:22 CDT 2022
Your observed performance is explained by the number of iterations, which varies based on a number of factors, plus imperfect memory bandwidth scaling (https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup).
Problems like this (and harder) can be solved much faster using a method like -pc_type gamg or -pc_type hypre (if configured with --download-hypre). Also note that this example does an intentionally poor job of partitioning (so it's easy to understand what PETSc is doing) -- you should use a better partition and preallocation for a real problem.
Sidarth Narayanan <snarayanan1 at altair.com> writes:
> Hello PETSc team,
>
> I am currently using PETSc to improve the performance of a CFD solver by using it's parallel linear solver. While doing so I ran across some scaling issues where performance increase ( measured as KSPSolve time using MPI_Wtime() function ) in the linear algebra operation was only 2 times while using 4 processes. So I tried to check the scaling of the example problem ex2f.F90 provided as a part of the PETSc package and ran across a similar issue there. Here is the issue, If I run the problem on 1000 x 1000 2-D, five-point stencil with the exact solution set to 1.0, the scaling is perfect. But as soon as I change the exact solution to random values by using the flag -random_exact_sol (generally between 0 and 1), the scaling takes a major hit. I have not changed the example code apart from increasing problem size and adding the time stamps before and after KSPSolve. Could you please let me know what I am missing (or) doing wrong here ? And I was also wondering if PETSc had an option to print out the pre-conditioned matrix as changing the exact solution changes the RHS and hence might change the pre-conditioned matrix. I have provided below the exact code and the results from the 4 runs including the compilation part.
>
> Source Code: (ex2f.F90 with time stamps and increased problem size)
> !
> ! Description: Solves a linear system in parallel with KSP (Fortran code).
> ! Also shows how to set a user-defined monitoring routine.
> !
> !
> !
> ! -----------------------------------------------------------------------
>
> program main
> #include <petsc/finclude/petscksp.h>
> use petscksp
> implicit none
> !
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ! Variable declarations
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> !
> ! Variables:
> ! ksp - linear solver context
> ! ksp - Krylov subspace method context
> ! pc - preconditioner context
> ! x, b, u - approx solution, right-hand-side, exact solution vectors
> ! A - matrix that defines linear system
> ! its - iterations for convergence
> ! norm - norm of error in solution
> ! rctx - random number generator context
> !
> ! Note that vectors are declared as PETSc "Vec" objects. These vectors
> ! are mathematical objects that contain more than just an array of
> ! double precision numbers. I.e., vectors in PETSc are not just
> ! double precision x(*).
> ! However, local vector data can be easily accessed via VecGetArray().
> ! See the Fortran section of the PETSc users manual for details.
> !
> PetscReal norm,t1,t2
> PetscInt i,j,II,JJ,m,n,its
> PetscInt Istart,Iend,ione
> PetscErrorCode ierr
> PetscMPIInt rank,size
> PetscBool flg
> PetscScalar v,one,neg_one
> Vec x,b,u
> Mat A
> KSP ksp
> PetscRandom rctx
> PetscViewerAndFormat vf,vzero
>
> ! These variables are not currently used.
> ! PC pc
> ! PCType ptype
> ! PetscReal tol
>
> ! Note: Any user-defined Fortran routines (such as MyKSPMonitor)
> ! MUST be declared as external.
>
> external MyKSPMonitor,MyKSPConverged
>
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ! Beginning of program
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
> if (ierr .ne. 0) then
> print*,'Unable to initialize PETSc'
> stop
> endif
> m = 1000
> n = 1000
> one = 1.0
> neg_one = -1.0
> ione = 1
> call PetscOptionsGetInt(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-m',m,flg,ierr)
> call PetscOptionsGetInt(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-n',n,flg,ierr)
> call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
> call MPI_Comm_size(PETSC_COMM_WORLD,size,ierr)
>
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ! Compute the matrix and right-hand-side vector that define
> ! the linear system, Ax = b.
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> ! Create parallel matrix, specifying only its global dimensions.
> ! When using MatCreate(), the matrix format can be specified at
> ! runtime. Also, the parallel partitioning of the matrix is
> ! determined by PETSc at runtime.
>
> call MatCreate(PETSC_COMM_WORLD,A,ierr)
> call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n,ierr)
> call MatSetFromOptions(A,ierr)
> call MatSetUp(A,ierr)
>
> ! Currently, all PETSc parallel matrix formats are partitioned by
> ! contiguous chunks of rows across the processors. Determine which
> ! rows of the matrix are locally owned.
>
> call MatGetOwnershipRange(A,Istart,Iend,ierr)
>
> ! Set matrix elements for the 2-D, five-point stencil in parallel.
> ! - Each processor needs to insert only elements that it owns
> ! locally (but any non-local elements will be sent to the
> ! appropriate processor during matrix assembly).
> ! - Always specify global row and columns of matrix entries.
> ! - Note that MatSetValues() uses 0-based row and column numbers
> ! in Fortran as well as in C.
>
> ! Note: this uses the less common natural ordering that orders first
> ! all the unknowns for x = h then for x = 2h etc; Hence you see JH = II +- n
> ! instead of JJ = II +- m as you might expect. The more standard ordering
> ! would first do all variables for y = h, then y = 2h etc.
>
> do 10, II=Istart,Iend-1
> v = -1.0
> i = II/n
> j = II - i*n
> if (i.gt.0) then
> JJ = II - n
> call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
> endif
> if (i.lt.m-1) then
> JJ = II + n
> call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
> endif
> if (j.gt.0) then
> JJ = II - 1
> call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
> endif
> if (j.lt.n-1) then
> JJ = II + 1
> call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
> endif
> v = 4.0
> call MatSetValues(A,ione,II,ione,II,v,INSERT_VALUES,ierr)
> 10 continue
>
> ! Assemble matrix, using the 2-step process:
> ! MatAssemblyBegin(), MatAssemblyEnd()
> ! Computations can be done while messages are in transition,
> ! by placing code between these two statements.
>
> call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)
> call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)
>
> ! Create parallel vectors.
> ! - Here, the parallel partitioning of the vector is determined by
> ! PETSc at runtime. We could also specify the local dimensions
> ! if desired -- or use the more general routine VecCreate().
> ! - When solving a linear system, the vectors and matrices MUST
> ! be partitioned accordingly. PETSc automatically generates
> ! appropriately partitioned matrices and vectors when MatCreate()
> ! and VecCreate() are used with the same communicator.
> ! - Note: We form 1 vector from scratch and then duplicate as needed.
>
> call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr)
> call VecSetFromOptions(u,ierr)
> call VecDuplicate(u,b,ierr)
> call VecDuplicate(b,x,ierr)
>
> ! Set exact solution; then compute right-hand-side vector.
> ! By default we use an exact solution of a vector with all
> ! elements of 1.0; Alternatively, using the runtime option
> ! -random_sol forms a solution vector with random components.
>
> call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr)
> if (flg) then
> call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr)
> call PetscRandomSetFromOptions(rctx,ierr)
> call VecSetRandom(u,rctx,ierr)
> call PetscRandomDestroy(rctx,ierr)
> else
> call VecSet(u,one,ierr)
> endif
> call MatMult(A,u,b,ierr)
>
> ! View the exact solution vector if desired
>
> call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-view_exact_sol',flg,ierr)
> if (flg) then
> call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
> endif
>
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ! Create the linear solver and set various options
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> ! Create linear solver context
>
> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>
> ! Set operators. Here the matrix that defines the linear system
> ! also serves as the preconditioning matrix.
>
> call KSPSetOperators(ksp,A,A,ierr)
>
> ! Set linear solver defaults for this problem (optional).
> ! - By extracting the KSP and PC contexts from the KSP context,
> ! we can then directly directly call any KSP and PC routines
> ! to set various options.
> ! - The following four statements are optional; all of these
> ! parameters could alternatively be specified at runtime via
> ! KSPSetFromOptions(). All of these defaults can be
> ! overridden at runtime, as indicated below.
>
> ! We comment out this section of code since the Jacobi
> ! preconditioner is not a good general default.
>
> ! call KSPGetPC(ksp,pc,ierr)
> ! ptype = PCJACOBI
> ! call PCSetType(pc,ptype,ierr)
> ! tol = 1.e-7
> ! call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>
> ! Set user-defined monitoring routine if desired
>
> call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr)
> if (flg) then
> vzero = 0
> call KSPMonitorSet(ksp,MyKSPMonitor,vzero,PETSC_NULL_FUNCTION,ierr)
> !
> ! Also use the default KSP monitor routine showing how it may be used from Fortran
> !
> call PetscViewerAndFormatCreate(PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,vf,ierr)
> call KSPMonitorSet(ksp,KSPMonitorResidual,vf,PetscViewerAndFormatDestroy,ierr)
> endif
>
> ! Set runtime options, e.g.,
> ! -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
> ! These options will override those specified above as long as
> ! KSPSetFromOptions() is called _after_ any other customization
> ! routines.
>
> call KSPSetFromOptions(ksp,ierr)
>
> ! Set convergence test routine if desired
>
> call PetscOptionsHasName(PETSC_NULL_OPTIONS,PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr)
> if (flg) then
> call KSPSetConvergenceTest(ksp,MyKSPConverged,0,PETSC_NULL_FUNCTION,ierr)
> endif
> !
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ! Solve the linear system
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> t1 = MPI_Wtime()
> call KSPSolve(ksp,b,x,ierr)
> t2 = MPI_Wtime()
> !x =A-1b
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ! Check solution and clean up
> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> ! Check the error
> call VecAXPY(x,neg_one,u,ierr)
> call VecNorm(x,NORM_2,norm,ierr)
> call KSPGetIterationNumber(ksp,its,ierr)
> if (rank .eq. 0) then
> if (norm .gt. 1.e-12) then
> write(6,100) norm,its
> else
> write(6,110) its
> endif
> write(6,111) t2 - t1
> endif
> 100 format('Norm of error ',e11.4,' iterations ',i5)
> 110 format('Norm of error < 1.e-12 iterations ',i5)
> 111 format('KSPSolve time: ',f10.3)
>
> ! Free work space. All PETSc objects should be destroyed when they
> ! are no longer needed.
>
> call KSPDestroy(ksp,ierr)
> call VecDestroy(u,ierr)
> call VecDestroy(x,ierr)
> call VecDestroy(b,ierr)
> call MatDestroy(A,ierr)
>
> ! Always call PetscFinalize() before exiting a program. This routine
> ! - finalizes the PETSc libraries as well as MPI
> ! - provides summary and diagnostic information if certain runtime
> ! options are chosen (e.g., -log_view). See PetscFinalize()
> ! manpage for more information.
>
> call PetscFinalize(ierr)
> end
>
> ! --------------------------------------------------------------
> !
> ! MyKSPMonitor - This is a user-defined routine for monitoring
> ! the KSP iterative solvers.
> !
> ! Input Parameters:
> ! ksp - iterative context
> ! n - iteration number
> ! rnorm - 2-norm (preconditioned) residual value (may be estimated)
> ! dummy - optional user-defined monitor context (unused here)
> !
> subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)
> use petscksp
> implicit none
>
> KSP ksp
> Vec x
> PetscErrorCode ierr
> PetscInt n,dummy
> PetscMPIInt rank
> PetscReal rnorm
>
> ! Build the solution vector
> call KSPBuildSolution(ksp,PETSC_NULL_VEC,x,ierr)
>
> ! Write the solution vector and residual norm to stdout
> ! - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
> ! handles data from multiple processors so that the
> ! output is not jumbled.
>
> call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
> if (rank .eq. 0) write(6,100) n
> call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
> if (rank .eq. 0) write(6,200) n,rnorm
>
> 100 format('iteration ',i5,' solution vector:')
> 200 format('iteration ',i5,' residual norm ',e11.4)
> ierr = 0
> end
>
> ! --------------------------------------------------------------
> !
> ! MyKSPConverged - This is a user-defined routine for testing
> ! convergence of the KSP iterative solvers.
> !
> ! Input Parameters:
> ! ksp - iterative context
> ! n - iteration number
> ! rnorm - 2-norm (preconditioned) residual value (may be estimated)
> ! dummy - optional user-defined monitor context (unused here)
> !
> subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)
> use petscksp
> implicit none
>
> KSP ksp
> PetscErrorCode ierr
> PetscInt n,dummy
> KSPConvergedReason flag
> PetscReal rnorm
>
> if (rnorm .le. .05) then
> flag = 1
> else
> flag = 0
> endif
> ierr = 0
>
> end
>
> !/*TEST
> !
> ! test:
> ! nsize: 2
> ! args: -pc_type jacobi -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always
> !
> ! test:
> ! suffix: 2
> ! nsize: 2
> ! args: -pc_type jacobi -my_ksp_monitor -ksp_gmres_cgs_refinement_type refine_always
> !
> !TEST*/
>
> Output:
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ make ex2f
> /home/snarayanan1/petsc-release/lib/petsc/bin/win32fe/win32fe ifort -MT -O3 -fpp -MT -O3 -fpp -I/home/snarayanan1/petsc-release/include -I/home/snarayanan1/petsc-release/arch-ci-mswin-opt-impi/include -I/cygdrive/c/PROGRA~2/Intel/oneAPI/mpi/2021.5.0/include ex2f.F90 -L/cygdrive/c/cygwin64/home/snarayanan1/petsc-release/arch-ci-mswin-opt-impi/lib -L/cygdrive/c/PROGRA~2/Intel/oneAPI/mkl/2022.0.0/lib/intel64 -lpetsc mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib /cygdrive/c/PROGRA~2/Intel/oneAPI/mpi/2021.5.0/lib/release/impi.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib -o ex2f
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 1 ex2f.exe
> Norm of error 0.1214E+02 iterations 2571
> KSPSolve time: 104.949
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 2 ex2f.exe
> Norm of error 0.1221E+02 iterations 1821
> KSPSolve time: 47.139
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 1 ex2f.exe -random_exact_sol
> Norm of error 0.8541E+02 iterations 1003
> KSPSolve time: 40.853
>
> snarayanan1 at USLN38 /cygdrive/c/sidarth/petsc-test/scaling_test
> $ $PETSC_DIR/lib/petsc/bin/petscmpiexec -n 2 ex2f.exe -random_exact_sol
> Norm of error 0.8562E+02 iterations 1191
> KSPSolve time: 31.745
>
>
> Thank You,
> Sidarth Narayanan
> Electronics Thermal Management Intern
> Altair Engineering Inc.
> Troy, MI
More information about the petsc-users
mailing list