[petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?

Tue Feb 4 13:16:04 CST 2020

Hao:
I would suggest to use a parallel sparse direct solver, e.g., superlu_dist
or mumps. These solvers can take advantage of your sparse data structure.
Once it works, then you may play with other preconditioners, such as
bjacobi + lu/ilu. See
https://www.mcs.anl.gov/petsc/miscellaneous/external.html
Hong

Dear all,
>
>
> I have a few questions about the implementation of diagonal ILU PC in
> PETSc. I want to solve a very simple system with KSP (in parallel), the
> nature of the system (finite difference time-harmonic Maxwell) is probably
> not important to the question itself. Long story short, I just need to
> solve a set of equations of Ax = b with a block diagonal system matrix,
> like (not sure if the mono font works):
>
>    |X    |
> A =|  Y  |
>    |    Z|
>
> Note that A is not really block-diagonal, it’s just a multi-diagonal
> matrix determined by the FD mesh, where most elements are close to
> diagonal. So instead of a full ILU decomposition, a D-ILU is a good
> approximation as a preconditioner, and the number of blocks should not
> matter too much:
>
>     |Lx      |         |Ux      |
> L = |   Ly   | and U = |   Uy   |
>     |      Lz|         |      Uz|
>
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N
> blocks) is quite efficient with Krylov subspace methods like BiCGstab or
> QMR in my sequential Fortran/Matlab code.
>
> So like most users, I am looking for a parallel implement with this
> problem in PETSc. After looking through the manual, it seems that the
> most straightforward way to do it is through PCBJACOBI. Not sure I
> understand it right, I just setup a 3-block PCJACOBI and give each of the
> block a KSP with PCILU. Is this supposed to be equivalent to my
> D-ILU preconditioner? My little block of fortran code would look like:
> ...
> *      call* PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
>      &     isubs,ierr)
>       *call* PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
>      &    isubs(istart:iend),ierr)
>       ! set up the block jacobi structure
>       *call* KSPSetup(ksp_local,ierr)
>       ! allocate sub ksps
>       allocate(ksp_sub(Nsub))
>       *call* PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
>      &     ksp_sub,ierr)
>       do i=1,Nsub
>           *call* KSPGetPC(ksp_sub(i),pc_sub,ierr)
>           !ILU preconditioner
>           *call* PCSetType(pc_sub,ptype,ierr)
>           *call* PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>           *call* KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>       end do
>       *call* KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>      &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> …
>
> I understand that the parallel performance may not be comparable, so I
> first set up a one-process test (with MPIAij, but all the rows are local
> since there is only one process). The system is solved without any
> problem (identical results within error). But the performance is actually a
> lot worse (code built without debugging flags in performance tests) than my
> own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse
> matrix format), which is hard to believe. I suspect the difference is from
> the PC as the PETSc version took much more BiCGstab iterations (60-ish vs
> 100-ish) to converge to the same relative tol.
>
> This is further confirmed when I change the setup of D-ILU (using 6 or 9
> blocks instead of 3). While my Fortran/Matlab codes see minimal performance
> difference (<5%) when I play with the D-ILU setup, increasing the number of
> D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a
> performance decrease of more than 50% in sequential test. So my
> implementation IS somewhat different in PETSc. Do I miss something in the
> PCBJACOBI setup? Or do I have some fundamental misunderstanding of how
> PCBJACOBI works in PETSc?
>
> If this is not the right way to implement a block diagonal ILU as
> (parallel) PC, please kindly point me to the right direction. I searched
> through the mail list to find some answers, only to find a couple of
> similar questions... An example would be nice.
>
> On the other hand, does PETSc support a simple way to use explicit L/U
> matrix as a preconditioner? I can import the  D-ILU matrix (I already
> converted my A matrix into Mat) constructed in my Fortran code to make a
> better comparison. Or do I have to construct my own PC using PCshell? If
> so, is there a good tutorial/example to learn about how to use PCSHELL (in
> a more advanced sense, like how to setup pc side and transpose)?
>
> Thanks in advance,
>
> Hao
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200204/a385a12f/attachment.html>