[petsc-users] Use block Jacobi preconditioner with SNES

Tue Aug 28 04:34:08 CDT 2018

>     Actually you do not need my new branch to achieve what you desired. All you need in your main program is something like
>
>    ierr = SNESCreate(PETSC_COMM_WORLD,&snes);CHKERRQ(ierr);
>    ierr = SNESGetKSP(snes,&ksp);CHKERRQ(ierr);
>    ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
>    ierr = PCSetType(pc,PCBJACOBI);CHKERRQ(ierr);
>    ierr = PCBJacobiSetTotalBlocks(pc,3,lens);CHKERRQ(ierr);  /* here you set your block sizes to whatever you need */
>
> Then simply do not call PCBJacobiGetSubKSP() but use the options database to set the inner solver with -sub_pc_type lu -sub_ksp_type preonly
>
>      I have updated the branch to move the PCBJacobiSetTotalBlocks() to the main program but left the callback in there for setting the inner solver types (though as I just said you don't need to use the callback since you can control the solver from the options database). The callback is needed, if, for example, you wished to use a different solver on different blocks (which is not your case).
>
>     Barry
>
>    PETSc developers - do you think we should put the callback functionality into PETSc? It allows doing things that are otherwise not doable but is rather ugly (perhaps too specialized)?
>
>
>

It works! Thanks a lot. Here is log of a 30x30x10 system (18000 blocks, 
with GMRes solver). I like to have variable sized block preconditioners 
and solvers in PETSc. Their application is more than it may first 
appear. If it is possible, I would like to contribute to PETSc code, to 
build a variable sized block Jacobi and block ILU(k) at the first step 
(If I can, of course). Where can I start?

   type: newtonls
   maximum iterations=2000, maximum function evaluations=2000
   tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
   total number of linear solver iterations=3
   total number of function evaluations=2
   norm schedule ALWAYS
   SNESLineSearch Object: 1 MPI processes
     type: bt
       interpolation: cubic
       alpha=1.000000e-04
     maxstep=1.000000e+08, minlambda=1.000000e-12
     tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
lambda=1.000000e-08
     maximum iterations=40
   KSP Object: 1 MPI processes
     type: gmres
       restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
       happy breakdown tolerance 1e-30
     maximum iterations=5000, initial guess is zero
     tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
     left preconditioning
     using PRECONDITIONED norm type for convergence test
   PC Object: 1 MPI processes
     type: bjacobi
       number of blocks = 18000
       Local solve is same for all blocks, in the following KSP and PC 
objects:
       KSP Object: (sub_) 1 MPI processes
         type: preonly
         maximum iterations=10000, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object: (sub_) 1 MPI processes
         type: lu
           out-of-place factorization
           tolerance for zero pivot 2.22045e-14
           matrix ordering: nd
           factor fill ratio given 0., needed 0.
             Factored matrix follows:
               Mat Object: 1 MPI processes
                 type: mkl_pardiso
                 rows=6, cols=6
                 package used to perform factorization: mkl_pardiso
                 total: nonzeros=26, allocated nonzeros=26
                 total number of mallocs used during MatSetValues calls =0
                   MKL_PARDISO run parameters:
                   MKL_PARDISO phase:             33
                   MKL_PARDISO iparm[1]:     1
                   MKL_PARDISO iparm[2]:     2
                   MKL_PARDISO iparm[3]:     1
                   MKL_PARDISO iparm[4]:     0
                   MKL_PARDISO iparm[5]:     0
                   MKL_PARDISO iparm[6]:     0
                   MKL_PARDISO iparm[7]:     0
                   MKL_PARDISO iparm[8]:     0
                   MKL_PARDISO iparm[9]:     0
                   MKL_PARDISO iparm[10]:     13
                   MKL_PARDISO iparm[11]:     1
                   MKL_PARDISO iparm[12]:     0
                   MKL_PARDISO iparm[13]:     1
                   MKL_PARDISO iparm[14]:     0
                   MKL_PARDISO iparm[15]:     144
                   MKL_PARDISO iparm[16]:     144
                   MKL_PARDISO iparm[17]:     0
                   MKL_PARDISO iparm[18]:     37
                   MKL_PARDISO iparm[19]:     0
                   MKL_PARDISO iparm[20]:     0
                   MKL_PARDISO iparm[21]:     0
                   MKL_PARDISO iparm[22]:     0
                   MKL_PARDISO iparm[23]:     0
                   MKL_PARDISO iparm[24]:     0
                   MKL_PARDISO iparm[25]:     0
                   MKL_PARDISO iparm[26]:     0
                   MKL_PARDISO iparm[27]:     0
                   MKL_PARDISO iparm[28]:     0
                   MKL_PARDISO iparm[29]:     0
                   MKL_PARDISO iparm[30]:     0
                   MKL_PARDISO iparm[31]:     0
                   MKL_PARDISO iparm[32]:     0
                   MKL_PARDISO iparm[33]:     0
                   MKL_PARDISO iparm[34]:     -1
                   MKL_PARDISO iparm[35]:     1
                   MKL_PARDISO iparm[36]:     0
                   MKL_PARDISO iparm[37]:     0
                   MKL_PARDISO iparm[38]:     0
                   MKL_PARDISO iparm[39]:     0
                   MKL_PARDISO iparm[40]:     0
                   MKL_PARDISO iparm[41]:     0
                   MKL_PARDISO iparm[42]:     0
                   MKL_PARDISO iparm[43]:     0
                   MKL_PARDISO iparm[44]:     0
                   MKL_PARDISO iparm[45]:     0
                   MKL_PARDISO iparm[46]:     0
                   MKL_PARDISO iparm[47]:     0
                   MKL_PARDISO iparm[48]:     0
                   MKL_PARDISO iparm[49]:     0
                   MKL_PARDISO iparm[50]:     0
                   MKL_PARDISO iparm[51]:     0
                   MKL_PARDISO iparm[52]:     0
                   MKL_PARDISO iparm[53]:     0
                   MKL_PARDISO iparm[54]:     0
                   MKL_PARDISO iparm[55]:     0
                   MKL_PARDISO iparm[56]:     0
                   MKL_PARDISO iparm[57]:     -1
                   MKL_PARDISO iparm[58]:     0
                   MKL_PARDISO iparm[59]:     0
                   MKL_PARDISO iparm[60]:     0
                   MKL_PARDISO iparm[61]:     144
                   MKL_PARDISO iparm[62]:     145
                   MKL_PARDISO iparm[63]:     21
                   MKL_PARDISO iparm[64]:     0
                   MKL_PARDISO maxfct:     1
                   MKL_PARDISO mnum:     1
                   MKL_PARDISO mtype:     11
                   MKL_PARDISO n:     6
                   MKL_PARDISO nrhs:     1
                   MKL_PARDISO msglvl:     0
         linear system matrix = precond matrix:
         Mat Object: 1 MPI processes
           type: seqaij
           rows=6, cols=6
           total: nonzeros=26, allocated nonzeros=26
           total number of mallocs used during MatSetValues calls =0
             using I-node routines: found 4 nodes, limit used is 5
     linear system matrix = precond matrix:
     Mat Object: 1 MPI processes
       type: seqaij
       rows=108000, cols=108000
       total: nonzeros=2868000, allocated nonzeros=8640000
       total number of mallocs used during MatSetValues calls =0
         not using I-node routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe 
on a  named ALIREZA-PC with 1 processor, by AliReza Tue Aug 28 13:57:09 2018
Using Petsc Development GIT revision: v3.9.3-1238-gce82fdcfd6  GIT Date: 
2018-08-27 15:47:19 -0500

                          Max       Max/Min     Avg       Total
Time (sec):           1.353e+02     1.000   1.353e+02
Objects:              1.980e+05     1.000   1.980e+05
Flop:                 2.867e+07     1.000   2.867e+07  2.867e+07
Flop/sec:             2.119e+05     1.000   2.119e+05  2.119e+05
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length 
N --> 2N flop
                             and VecAXPY() for complex vectors of length 
N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages 
---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total    Count 
%Total     Avg         %Total    Count   %Total
  0:      Main Stage: 1.3529e+02 100.0%  2.8668e+07 100.0% 0.000e+00   
0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
    Count: number of times phase was executed
    Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    AvgLen: average message length (bytes)
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
       %T - percent time in this phase         %F - percent flop in this 
phase
       %M - percent messages in this phase     %L - percent message 
lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec) 
Flop                              --- Global ---  --- Stage ---- Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen  
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSidedF         2 1.0 1.2701e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESSolve              1 1.0 1.1583e+02 1.0 2.87e+07 1.0 0.0e+00 0.0e+00 
0.0e+00 86100  0  0  0  86100  0  0  0     0
SNESFunctionEval       2 1.0 5.4101e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
SNESJacobianEval       1 1.0 9.3770e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 69  0  0  0  0  69  0  0  0  0     0
SNESLineSearch         1 1.0 3.1033e+00 1.0 6.82e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  2 24  0  0  0   2 24  0  0  0     2
VecDot                 1 1.0 1.8688e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0  1156
VecMDot                3 1.0 9.9299e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  5  0  0  0   0  5  0  0  0  1305
VecNorm                7 1.0 6.0845e-03 1.0 1.51e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  5  0  0  0   0  5  0  0  0   248
VecScale               4 1.0 1.4437e+00 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  1  2  0  0  0   1  2  0  0  0     0
VecCopy                3 1.0 1.6059e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecSet             90002 1.0 1.3843e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                1 1.0 3.1733e-01 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0     1
VecWAXPY               1 1.0 2.2665e-04 1.0 1.08e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0   477
VecMAXPY               4 1.0 8.6085e-04 1.0 1.94e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0  2258
VecAssemblyBegin       2 1.0 1.6379e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 1.4112e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith         2 1.0 3.1304e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   0  2  0  0  0  1380
VecReduceComm          1 1.0 2.1382e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           4 1.0 1.4441e+00 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  1  5  0  0  0   1  5  0  0  0     1
MatMult                4 1.0 2.0402e-02 1.0 2.25e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0 79  0  0  0   0 79  0  0  0  1103
MatSolve           72000 1.0 5.3514e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatLUFactorSym     18000 1.0 1.9405e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatLUFactorNum     18000 1.0 1.8373e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin   18002 1.0 1.0409e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd     18002 1.0 3.3879e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ        18000 1.0 3.1819e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       1 1.0 3.7015e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering     18000 1.0 3.0787e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 2.7952e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 1.0 2.9153e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp           18001 1.0 7.5898e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 1.6244e+01 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 
0.0e+00 12 75  0  0  0  12 75  0  0  0     1
KSPGMRESOrthog         3 1.0 8.4669e-02 1.0 2.59e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  9  0  0  0   0  9  0  0  0    31
PCSetUp            18001 1.0 3.3536e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
PCSetUpOnBlocks        1 1.0 2.5973e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
PCApply                4 1.0 6.2752e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   5  0  0  0  0     0
PCApplyOnBlocks    72000 1.0 5.9278e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

                 SNES     1              0            0     0.
               DMSNES     1              0            0     0.
       SNESLineSearch     1              0            0     0.
               Vector 36020              0            0     0.
               Matrix 36001              0            0     0.
     Distributed Mesh     2              0            0     0.
            Index Set 90000          36000     29088000     0.
    Star Forest Graph     4              0            0     0.
      Discrete System     2              0            0     0.
        Krylov Solver 18001              0            0     0.
      DMKSP interface     1              0            0     0.
       Preconditioner 18001              0            0     0.
               Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 1.28294e-07
#PETSc Option Table entries:
-ksp_atol 1e-6
-ksp_rtol 1e-5
-snes_rtol 1e-4
-sub_ksp_type preonly
-sub_pc_factor_mat_solver_type mkl_pardiso
-sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/home/alireza/PetscGit 
--with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl 
--with-hypre-incl
ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include 
--with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib 
--with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib 
ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" 
--with-mpi-include=/cygdrive/E/Program_Fi
les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include 
--with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
el64/lib/impi.lib 
--with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe 
--with-debugging=0 --with-blas
-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
--with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
-CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" 
-FFLAGS="-MT -O2 -Qopenmp"
-----------------------------------------
Libraries compiled on 2018-08-27 22:42:15 on AliReza-PC
Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
Using PETSc directory: /home/alireza/PetscGit
Using PETSc arch:
-----------------------------------------

Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl 
-O2 -MT -wd4996 -Qopenmp
Using Fortran compiler: 
/home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp 
-fpp
-----------------------------------------

Using include paths: -I/home/alireza/PetscGit/include 
-I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include 
-I/cygdrive/E/hypre-2.11.2/
Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include 
-I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
-----------------------------------------

Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe 
ifort
Using libraries: -L/home/alireza/PetscGit/lib 
-L/home/alireza/PetscGit/lib -lpetsc 
/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib 
/cygdrive/E/Trilinos-master/Bins/lib
/ml.lib 
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib 
Gdi32.lib User32.lib
  Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------