[petsc-users] Use block Jacobi preconditioner with SNES

Wed Aug 29 15:18:15 CDT 2018

> On Aug 29, 2018, at 2:08 PM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
> 
> Thanks again,
> 
> 
>> You might want to use a memory profiler to see if you have important
>> leaks.  See comment below about destroying objects and PETSc memory
>> logging.
> That's a very good suggestion. I'll do it.
> 
> 
> 
>> This matrix takes just a little over 100 MB (only about 35 MB of which
>> is needed).  If this code is using a lot more memory than that, you
>> should use a memory profiler to find out where (e.g., some data
>> structures in your code).
> The memory usage was high for the previous (and wrong) one with bjacobi/ilu.  My code with vpbjacobi preconditioner

   Can you confirm if your code ran successfully with vpbjacobi and if the convergence history was very similar to that achieved using 
bjacobi ?

     Thanks

       Barry

> only requires 150K, which as you stated, most of it is not required by PETSc.
> 
> 
> 
>> All of your time is in function and Jacobian evaluation.
>> The entire linear solve is less than 50 milliseconds.
> Thanks for your comment. You are right. The Jacobian construction involves about 40K lines of code, and a lot of phase equilibrium calculations, which are very time intensive.
> 
> 
> 
>> It looks like you have forgotten to destroy all of your objects.  If you
>> clean up after yourself, you'll also see the totaly memory used by each
>> of the PETSc classes that you use.
> That is because of PetscLogView was placed before the destroy functions. I moved it after them, and here is the new log: (additionally, this is the log of one time step only, so all of the objects won't be destroyed before all time steps have been finished)
> 
> 
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=2000, maximum function evaluations=2000
>   tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
>   total number of linear solver iterations=3
>   total number of function evaluations=2
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object: 1 MPI processes
>     type: gmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=5000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
>     left preconditioning
>     using PRECONDITIONED norm type for convergence test
>   PC Object: 1 MPI processes
>     type: vpbjacobi
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI processes
>       type: seqaij
>       rows=108000, cols=108000
>       total: nonzeros=2868000, allocated nonzeros=8640000
>       total number of mallocs used during MatSetValues calls =0
>         not using I-node routines
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> 
> E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe on a  named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 23:33:39 2018
> Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date: 2018-08-28 19:19:26 -0500
> 
>                          Max       Max/Min     Avg       Total
> Time (sec):           1.190e+02     1.000   1.190e+02
> Objects:              3.600e+01     1.000   3.600e+01
> Flop:                 2.867e+07     1.000   2.867e+07  2.867e+07
> Flop/sec:             2.408e+05     1.000   2.408e+05  2.408e+05
> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00     0.000
> 
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flop
>                             and VecAXPY() for complex vectors of length N --> 8N flop
> 
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count %Total     Avg         %Total    Count   %Total
>  0:      Main Stage: 1.1905e+02 100.0%  2.8668e+07 100.0% 0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
> 
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec) Flop                              --- Global ---  --- Stage ---- Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> PetscBarrier           1 1.0 2.1382e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> BuildTwoSidedF         2 1.0 4.2337e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SNESSolve              1 1.0 1.0010e+02 1.0 2.87e+07 1.0 0.0e+00 0.0e+00 0.0e+00 84100  0  0  0  84100  0  0  0     0
> SNESFunctionEval       2 1.0 5.2605e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> SNESJacobianEval       1 1.0 9.4788e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 80  0  0  0  0  80  0  0  0  0     0
> SNESLineSearch         1 1.0 2.6283e+00 1.0 6.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2 24  0  0  0   2 24  0  0  0     3
> VecDot                 1 1.0 1.8261e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1183
> VecMDot                3 1.0 8.1381e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1592
> VecNorm                7 1.0 6.5849e-03 1.0 1.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0   230
> VecScale               4 1.0 1.6208e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  2665
> VecCopy                3 1.0 4.4561e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 2 1.0 1.4369e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY                1 1.0 1.5823e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1365
> VecWAXPY               1 1.0 1.9501e-04 1.0 1.08e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   554
> VecMAXPY               4 1.0 7.7190e-04 1.0 1.94e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0  2518
> VecAssemblyBegin       2 1.0 7.9970e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         2 1.0 1.2402e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecReduceArith         2 1.0 2.8567e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  1512
> VecReduceComm          1 1.0 5.1318e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize           4 1.0 4.6357e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  2796
> MatMult                4 1.0 1.6927e-02 1.0 2.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 79  0  0  0   0 79  0  0  0  1330
> MatAssemblyBegin       2 1.0 8.5529e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         2 1.0 2.4032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         1 1.0 2.8857e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 3.3998e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 1.2325e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 3.7389e-02 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 75  0  0  0   0 75  0  0  0   579
> KSPGMRESOrthog         3 1.0 1.3552e-03 1.0 2.59e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  9  0  0  0   0  9  0  0  0  1913
> PCSetUp                1 1.0 1.4842e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply                4 1.0 3.1607e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>                 SNES     1              1         1372     0.
>               DMSNES     1              1          672     0.
>       SNESLineSearch     1              1         1000     0.
>               Vector    20             20     17313120     0.
>               Matrix     1              1    105843212     0.
>     Distributed Mesh     2              2         9504     0.
>    Star Forest Graph     4              4         3200     0.
>      Discrete System     2              2         1856     0.
>        Krylov Solver     1              1        18416     0.
>      DMKSP interface     1              1          656     0.
>       Preconditioner     1              1          832     0.
>               Viewer     1              0            0     0.
> ========================================================================================================================
> Average time to get PetscTime(): 0.
> #PETSc Option Table entries:
> -ksp_atol 1e-6
> -ksp_rtol 1e-5
> -snes_rtol 1e-4
> -sub_ksp_type preonly
> -sub_pc_factor_mat_solver_type mkl_pardiso
> -sub_pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --prefix=/home/alireza/PetscGit --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl --with-hypre-incl
> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" --with-mpi-include=/cygdrive/E/Program_Fi
> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
> el64/lib/impi.lib --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe --with-debugging=0 --with-blas
> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" -FFLAGS="-MT -O2 -Qopenmp"
> -----------------------------------------
> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
> Using PETSc directory: /home/alireza/PetscGit
> Using PETSc arch:
> -----------------------------------------
> 
> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl -O2 -MT -wd4996 -Qopenmp
> Using Fortran compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp -fpp
> -----------------------------------------
> 
> Using include paths: -I/home/alireza/PetscGit/include -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include -I/cygdrive/E/hypre-2.11.2/
> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
> -----------------------------------------
> 
> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort
> Using libraries: -L/home/alireza/PetscGit/lib -L/home/alireza/PetscGit/lib -lpetsc /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib /cygdrive/E/Trilinos-master/Bins/lib
> /ml.lib /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib Gdi32.lib User32.lib
>  Advapi32.lib Kernel32.lib Ws2_32.lib
> -----------------------------------------
> 
> 
> 
> Many thanks
> Ali
> 
> 
> 
> 
>> 
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 8.55292e-08
>>> #PETSc Option Table entries:
>>> -ksp_atol 1e-6
>>> -ksp_rtol 1e-5
>>> -snes_rtol 1e-4
>>> -sub_ksp_type preonly
>>> -sub_pc_factor_mat_solver_type mkl_pardiso
>>> -sub_pc_type lu
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --prefix=/home/alireza/PetscGit
>>> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
>>> --with-hypre-incl
>>> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
>>> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
>>> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
>>> ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
>>> --with-mpi-include=/cygdrive/E/Program_Fi
>>> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
>>> el64/lib/impi.lib
>>> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
>>> --with-debugging=0 --with-blas
>>> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
>>> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
>>> -FFLAGS="-MT -O2 -Qopenmp"
>>> -----------------------------------------
>>> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
>>> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
>>> Using PETSc directory: /home/alireza/PetscGit
>>> Using PETSc arch:
>>> -----------------------------------------
>>> 
>>> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>> -O2 -MT -wd4996 -Qopenmp
>>> Using Fortran compiler:
>>> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
>>> -fpp
>>> -----------------------------------------
>>> 
>>> Using include paths: -I/home/alireza/PetscGit/include
>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
>>> -I/cygdrive/E/hypre-2.11.2/
>>> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>> -----------------------------------------
>>> 
>>> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
>>> ifort
>>> Using libraries: -L/home/alireza/PetscGit/lib
>>> -L/home/alireza/PetscGit/lib -lpetsc
>>> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>> /cygdrive/E/Trilinos-master/Bins/lib
>>> /ml.lib
>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
>>> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
>>> Gdi32.lib User32.lib
>>>   Advapi32.lib Kernel32.lib Ws2_32.lib
>>> -----------------------------------------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 8/29/2018 10:29 PM, Jed Brown wrote:
>>>> This is bjacobi/lu, not vpbjacobi.
>>>> 
>>>> How are you measuring memory usage?  See a couple inline comments below.
>>>> 
>>>> Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> writes:
>>>> 
>>>>> I noticed that I forgot to include the log for 10x10x5 system. Here it is:
>>>>> 
>>>>> 
>>>>> 1 MPI processes
>>>>>     type: newtonls
>>>>>     maximum iterations=2000, maximum function evaluations=2000
>>>>>     tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
>>>>>     total number of linear solver iterations=1
>>>>>     total number of function evaluations=2
>>>>>     norm schedule ALWAYS
>>>>>     SNESLineSearch Object: 1 MPI processes
>>>>>       type: bt
>>>>>         interpolation: cubic
>>>>>         alpha=1.000000e-04
>>>>>       maxstep=1.000000e+08, minlambda=1.000000e-12
>>>>>       tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>>>> lambda=1.000000e-08
>>>>>       maximum iterations=40
>>>>>     KSP Object: 1 MPI processes
>>>>>       type: gmres
>>>>>         restart=30, using Classical (unmodified) Gram-Schmidt
>>>>> Orthogonalization with no iterative refinement
>>>>>         happy breakdown tolerance 1e-30
>>>>>       maximum iterations=5000, initial guess is zero
>>>>>       tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
>>>>>       left preconditioning
>>>>>       using PRECONDITIONED norm type for convergence test
>>>>>     PC Object: 1 MPI processes
>>>>>       type: bjacobi
>>>>>         number of blocks = 1
>>>>>         Local solve is same for all blocks, in the following KSP and PC
>>>>> objects:
>>>>>         KSP Object: (sub_) 1 MPI processes
>>>>>           type: preonly
>>>>>           maximum iterations=10000, initial guess is zero
>>>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>           left preconditioning
>>>>>           using NONE norm type for convergence test
>>>>>         PC Object: (sub_) 1 MPI processes
>>>>>           type: lu
>>>>>             out-of-place factorization
>>>>>             tolerance for zero pivot 2.22045e-14
>>>>>             matrix ordering: nd
>>>>>             factor fill ratio given 0., needed 0.
>>>>>               Factored matrix follows:
>>>>>                 Mat Object: 1 MPI processes
>>>>>                   type: mkl_pardiso
>>>>>                   rows=6000, cols=6000
>>>>>                   package used to perform factorization: mkl_pardiso
>>>>>                   total: nonzeros=150000, allocated nonzeros=150000
>>>>>                   total number of mallocs used during MatSetValues calls =0
>>>>>                     MKL_PARDISO run parameters:
>>>>>                     MKL_PARDISO phase:             33
>>>>>                     MKL_PARDISO iparm[1]:     1
>>>>>                     MKL_PARDISO iparm[2]:     2
>>>>>                     MKL_PARDISO iparm[3]:     2
>>>>>                     MKL_PARDISO iparm[4]:     0
>>>>>                     MKL_PARDISO iparm[5]:     0
>>>>>                     MKL_PARDISO iparm[6]:     0
>>>>>                     MKL_PARDISO iparm[7]:     0
>>>>>                     MKL_PARDISO iparm[8]:     0
>>>>>                     MKL_PARDISO iparm[9]:     0
>>>>>                     MKL_PARDISO iparm[10]:     13
>>>>>                     MKL_PARDISO iparm[11]:     1
>>>>>                     MKL_PARDISO iparm[12]:     0
>>>>>                     MKL_PARDISO iparm[13]:     1
>>>>>                     MKL_PARDISO iparm[14]:     0
>>>>>                     MKL_PARDISO iparm[15]:     11228
>>>>>                     MKL_PARDISO iparm[16]:     7563
>>>>>                     MKL_PARDISO iparm[17]:     37035
>>>>>                     MKL_PARDISO iparm[18]:     4204238
>>>>>                     MKL_PARDISO iparm[19]:     2659
>>>>>                     MKL_PARDISO iparm[20]:     0
>>>>>                     MKL_PARDISO iparm[21]:     0
>>>>>                     MKL_PARDISO iparm[22]:     0
>>>>>                     MKL_PARDISO iparm[23]:     0
>>>>>                     MKL_PARDISO iparm[24]:     0
>>>>>                     MKL_PARDISO iparm[25]:     0
>>>>>                     MKL_PARDISO iparm[26]:     0
>>>>>                     MKL_PARDISO iparm[27]:     0
>>>>>                     MKL_PARDISO iparm[28]:     0
>>>>>                     MKL_PARDISO iparm[29]:     0
>>>>>                     MKL_PARDISO iparm[30]:     0
>>>>>                     MKL_PARDISO iparm[31]:     0
>>>>>                     MKL_PARDISO iparm[32]:     0
>>>>>                     MKL_PARDISO iparm[33]:     0
>>>>>                     MKL_PARDISO iparm[34]:     -1
>>>>>                     MKL_PARDISO iparm[35]:     1
>>>>>                     MKL_PARDISO iparm[36]:     0
>>>>>                     MKL_PARDISO iparm[37]:     0
>>>>>                     MKL_PARDISO iparm[38]:     0
>>>>>                     MKL_PARDISO iparm[39]:     0
>>>>>                     MKL_PARDISO iparm[40]:     0
>>>>>                     MKL_PARDISO iparm[41]:     0
>>>>>                     MKL_PARDISO iparm[42]:     0
>>>>>                     MKL_PARDISO iparm[43]:     0
>>>>>                     MKL_PARDISO iparm[44]:     0
>>>>>                     MKL_PARDISO iparm[45]:     0
>>>>>                     MKL_PARDISO iparm[46]:     0
>>>>>                     MKL_PARDISO iparm[47]:     0
>>>>>                     MKL_PARDISO iparm[48]:     0
>>>>>                     MKL_PARDISO iparm[49]:     0
>>>>>                     MKL_PARDISO iparm[50]:     0
>>>>>                     MKL_PARDISO iparm[51]:     0
>>>>>                     MKL_PARDISO iparm[52]:     0
>>>>>                     MKL_PARDISO iparm[53]:     0
>>>>>                     MKL_PARDISO iparm[54]:     0
>>>>>                     MKL_PARDISO iparm[55]:     0
>>>>>                     MKL_PARDISO iparm[56]:     0
>>>>>                     MKL_PARDISO iparm[57]:     -1
>>>>>                     MKL_PARDISO iparm[58]:     0
>>>>>                     MKL_PARDISO iparm[59]:     0
>>>>>                     MKL_PARDISO iparm[60]:     0
>>>>>                     MKL_PARDISO iparm[61]:     11228
>>>>>                     MKL_PARDISO iparm[62]:     7868
>>>>>                     MKL_PARDISO iparm[63]:     3629
>>>>>                     MKL_PARDISO iparm[64]:     0
>>>>>                     MKL_PARDISO maxfct:     1
>>>>>                     MKL_PARDISO mnum:     1
>>>>>                     MKL_PARDISO mtype:     11
>>>>>                     MKL_PARDISO n:     6000
>>>>>                     MKL_PARDISO nrhs:     1
>>>>>                     MKL_PARDISO msglvl:     0
>>>>>           linear system matrix = precond matrix:
>>>>>           Mat Object: 1 MPI processes
>>>>>             type: seqaij
>>>>>             rows=6000, cols=6000
>>>>>             total: nonzeros=150000, allocated nonzeros=480000
>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>               not using I-node routines
>>>>>       linear system matrix = precond matrix:
>>>>>       Mat Object: 1 MPI processes
>>>>>         type: seqaij
>>>>>         rows=6000, cols=6000
>>>>>         total: nonzeros=150000, allocated nonzeros=480000
>>>> Note that you preallocated more than necessary here.
>>>> 
>>>>>         total number of mallocs used during MatSetValues calls =0
>>>>>           not using I-node routines
>>>>> ************************************************************************************************************************
>>>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>>>>> -fCourier9' to print this document            ***
>>>>> ************************************************************************************************************************
>>>>> 
>>>>> ---------------------------------------------- PETSc Performance
>>>>> Summary: ----------------------------------------------
>>>>> 
>>>>> E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe
>>>>> on a  named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 22:12:30 2018
>>>>> Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date:
>>>>> 2018-08-28 19:19:26 -0500
>>>>> 
>>>>>                            Max       Max/Min     Avg       Total
>>>>> Time (sec):           3.373e+01     1.000   3.373e+01
>>>>> Objects:              3.300e+01     1.000   3.300e+01
>>>>> Flop:                 7.500e+05     1.000   7.500e+05  7.500e+05
>>>>> Flop/sec:             2.224e+04     1.000   2.224e+04  2.224e+04
>>>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>>>> MPI Reductions:       0.000e+00     0.000
>>>>> 
>>>>> Flop counting convention: 1 flop = 1 real number operation of type
>>>>> (multiply/divide/add/subtract)
>>>>>                               e.g., VecAXPY() for real vectors of length
>>>>> N --> 2N flop
>>>>>                               and VecAXPY() for complex vectors of length
>>>>> N --> 8N flop
>>>>> 
>>>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>>>>> ---  -- Message Lengths --  -- Reductions --
>>>>>                           Avg     %Total     Avg     %Total Count
>>>>> %Total     Avg         %Total    Count   %Total
>>>>>    0:      Main Stage: 3.3730e+01 100.0%  7.5000e+05 100.0% 0.000e+00
>>>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>>> 
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> See the 'Profiling' chapter of the users' manual for details on
>>>>> interpreting output.
>>>>> Phase summary info:
>>>>>      Count: number of times phase was executed
>>>>>      Time and Flop: Max - maximum over all processors
>>>>>                     Ratio - ratio of maximum to minimum over all processors
>>>>>      Mess: number of messages sent
>>>>>      AvgLen: average message length (bytes)
>>>>>      Reduct: number of global reductions
>>>>>      Global: entire computation
>>>>>      Stage: stages of a computation. Set stages with PetscLogStagePush()
>>>>> and PetscLogStagePop().
>>>>>         %T - percent time in this phase         %F - percent flop in this
>>>>> phase
>>>>>         %M - percent messages in this phase     %L - percent message
>>>>> lengths in this phase
>>>>>         %R - percent reductions in this phase
>>>>>      Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>>>>> over all processors)
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> Event                Count      Time (sec)
>>>>> Flop                              --- Global ---  --- Stage ---- Total
>>>>>                      Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen
>>>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> 
>>>>> --- Event Stage 0: Main Stage
>>>>> 
>>>>> BuildTwoSidedF         2 1.0 7.9003e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> SNESSolve              1 1.0 7.3396e+00 1.0 7.50e+05 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00 22100  0  0  0  22100  0  0  0     0
>>>>> SNESFunctionEval       2 1.0 3.8894e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> SNESJacobianEval       1 1.0 5.1611e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00 15  0  0  0  0  15  0  0  0  0     0
>>>> Most of the time here is spent in your code computing the Jacobian, not
>>>> in the solver.
>>>> 
>>>>> SNESLineSearch         1 1.0 1.7655e-02 1.0 3.60e+05 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0 48  0  0  0   0 48  0  0  0    20
>>>>> VecDot                 1 1.0 1.1546e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0  1039
>>>>> VecMDot                1 1.0 2.3093e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   520
>>>>> VecNorm                5 1.0 4.9769e-01 1.0 6.00e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  1  8  0  0  0   1  8  0  0  0     0
>>>>> VecScale               2 1.0 9.3291e-03 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0     1
>>>>> VecCopy                3 1.0 2.3093e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecSet                 4 1.0 1.5395e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecAXPY                1 1.0 2.5734e-02 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0     0
>>>>> VecWAXPY               1 1.0 1.3257e-05 1.0 6.00e+03 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   453
>>>>> VecMAXPY               2 1.0 2.3521e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  1020
>>>>> VecAssemblyBegin       2 1.0 7.9040e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecAssemblyEnd         2 1.0 1.7961e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecReduceArith         2 1.0 9.8359e-06 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  2440
>>>>> VecReduceComm          1 1.0 4.7041e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecNormalize           2 1.0 9.3552e-03 1.0 3.60e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  5  0  0  0   0  5  0  0  0     4
>>>>> MatMult                2 1.0 4.4860e-04 1.0 5.88e+05 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0 78  0  0  0   0 78  0  0  0  1311
>>>>> MatSolve               2 1.0 1.7876e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>>> MatLUFactorSym         1 1.0 7.6157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>>> MatLUFactorNum         1 1.0 6.5977e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>>> MatAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatAssemblyEnd         2 1.0 1.2962e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatGetRowIJ            1 1.0 1.1534e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatGetOrdering         1 1.0 5.1809e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatZeroEntries         1 1.0 1.1298e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatView                3 1.0 3.8344e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>>> KSPSetUp               2 1.0 3.2843e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> KSPSolve               1 1.0 1.6414e+00 1.0 3.78e+05 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  5 50  0  0  0   5 50  0  0  0     0
>>>>> KSPGMRESOrthog         1 1.0 5.9015e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0   407
>>>>> PCSetUp                2 1.0 1.4267e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>>>> PCSetUpOnBlocks        1 1.0 1.4266e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>>>> PCApply                2 1.0 1.7882e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> 
>>>>> Memory usage is given in bytes:
>>>>> 
>>>>> Object Type          Creations   Destructions     Memory Descendants' Mem.
>>>>> Reports information only for process 0.
>>>>> 
>>>>> --- Event Stage 0: Main Stage
>>>>> 
>>>>>                   SNES     1              0            0     0.
>>>>>                 DMSNES     1              0            0     0.
>>>>>         SNESLineSearch     1              0            0     0.
>>>>>                 Vector    12              0            0     0.
>>>>>                 Matrix     2              0            0     0.
>>>>>       Distributed Mesh     2              0            0     0.
>>>>>              Index Set     2              0            0     0.
>>>>>      Star Forest Graph     4              0            0     0.
>>>>>        Discrete System     2              0            0     0.
>>>>>          Krylov Solver     2              0            0     0.
>>>>>        DMKSP interface     1              0            0     0.
>>>>>         Preconditioner     2              0            0     0.
>>>>>                 Viewer     1              0            0     0.
>>>>> ========================================================================================================================
>>>>> Average time to get PetscTime(): 4.27647e-08
>>>>> #PETSc Option Table entries:
>>>>> -ksp_atol 1e-6
>>>>> -ksp_rtol 1e-5
>>>>> -snes_rtol 1e-4
>>>>> -sub_ksp_type preonly
>>>>> -sub_pc_factor_mat_solver_type mkl_pardiso
>>>>> -sub_pc_type lu
>>>>> #End of PETSc Option Table entries
>>>>> Compiled without FORTRAN kernels
>>>>> Compiled with full precision matrices (default)
>>>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
>>>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>>>> Configure options: --prefix=/home/alireza/PetscGit
>>>>> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
>>>>> --with-hypre-incl
>>>>> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
>>>>> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>>>> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
>>>>> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
>>>>> ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
>>>>> --with-mpi-include=/cygdrive/E/Program_Fi
>>>>> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>>>> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
>>>>> el64/lib/impi.lib
>>>>> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
>>>>> --with-debugging=0 --with-blas
>>>>> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>>> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
>>>>> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>>> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
>>>>> -FFLAGS="-MT -O2 -Qopenmp"
>>>>> -----------------------------------------
>>>>> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
>>>>> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
>>>>> Using PETSc directory: /home/alireza/PetscGit
>>>>> Using PETSc arch:
>>>>> -----------------------------------------
>>>>> 
>>>>> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>>>> -O2 -MT -wd4996 -Qopenmp
>>>>> Using Fortran compiler:
>>>>> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
>>>>> -fpp
>>>>> -----------------------------------------
>>>>> 
>>>>> Using include paths: -I/home/alireza/PetscGit/include
>>>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
>>>>> -I/cygdrive/E/hypre-2.11.2/
>>>>> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
>>>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>>>> -----------------------------------------
>>>>> 
>>>>> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>>>> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
>>>>> ifort
>>>>> Using libraries: -L/home/alireza/PetscGit/lib
>>>>> -L/home/alireza/PetscGit/lib -lpetsc
>>>>> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>>>> /cygdrive/E/Trilinos-master/Bins/lib
>>>>> /ml.lib
>>>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
>>>>> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
>>>>> Gdi32.lib User32.lib
>>>>>    Advapi32.lib Kernel32.lib Ws2_32.lib
>>>>> -----------------------------------------
>>>>> 
>>>>> 
>>>>> 
>>>>> On 8/29/2018 7:40 PM, Smith, Barry F. wrote:
>>>>>>      You need to add the line
>>>>>> 
>>>>>>     ierr = MatSetVariableBlockSizes(J,3,lens);CHKERRQ(ierr);
>>>>>> 
>>>>>> with the number of blocks replacing 3 and the sizes of each block in lens
>>>>>> 
>>>>>>      Barry
>>>>>> 
>>>>>> 
>>>>>>> On Aug 29, 2018, at 4:27 AM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>>> 
>>>>>>> Barry,
>>>>>>> 
>>>>>>> Thanks a lot for your efforts. Using PCVPBJACOBI causes SNESSolve to throw the following error (the code and the system being simulated is the same, just PCBJACOBI changed to PCVPBJACOBI):
>>>>>>> 
>>>>>>> 
>>>>>>> [0]PETSC ERROR: --------------------- Error Message ---------------------------------------
>>>>>>> -----------------------
>>>>>>> [0]PETSC ERROR: Nonconforming object sizes
>>>>>>> [0]PETSC ERROR: Total blocksizes 0 doesn't match number matrix rows 108000
>>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date: 2018-08-28 19:19:26 -0500
>>>>>>> [0]PETSC ERROR: E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe on a  named ALIREZA-PC by AliReza Wed Aug 29 13:46:38 2018
>>>>>>> [0]PETSC ERROR: Configure options --prefix=/home/alireza/PetscGit --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl --
>>>>>>> with-hypre-include=/cygdrive/E/hypre-2.11.2/Builds/Bins/include --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib --with-ml-include=/cygdrive/E/Trilinos
>>>>>>> -master/Bins/include --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" --with-mpi-include=/cygdri
>>>>>>> ve/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/
>>>>>>> windows/mpi/intel64/lib/impi.lib --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe --with-debuggin
>>>>>>> g=0 --with-blas-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib --with-lapack-lib=/cygdrive/E/Program_Files_
>>>>>>> x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" -FFLAGS="-MT -O2 -
>>>>>>> Qopenmp"
>>>>>>> [0]PETSC ERROR: #1 MatInvertVariableBlockDiagonal_SeqAIJ() line 1569 in E:\cygwin64\home\alireza\PETSc\src\mat\impls\aij\seq\aij.c
>>>>>>> [0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal() line 10482 in E:\cygwin64\home\alireza\PETSc\src\mat\interface\matrix.c
>>>>>>> [0]PETSC ERROR: #3 PCSetUp_VPBJacobi() line 120 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\impls\vpbjacobi\vpbjacobi.c
>>>>>>> [0]PETSC ERROR: #4 PCSetUp() line 932 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\interface\precon.c
>>>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 381 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>>>>> [0]PETSC ERROR: #6 KSPSolve() line 612 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>>>>> [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 224 in E:\cygwin64\home\alireza\PETSc\src\snes\impls\ls\ls.c
>>>>>>> [0]PETSC ERROR: #8 SNESSolve() line 4355 in E:\cygwin64\home\alireza\PETSc\src\snes\interface\snes.c
>>>>>>> 
>>>>>>> Many thanks,
>>>>>>> Ali
>>>>>>> 
>>>>>>> On 8/29/2018 2:24 AM, Smith, Barry F. wrote:
>>>>>>>>     Ali,
>>>>>>>> 
>>>>>>>>       In the branch barry/feature-PCVPBJACOBI (see src/snes/examples/tutorials/ex5.c) I have implemented PCVPBJACOBI that is a point block Jacobi with variable size blocks. It has very little testing.
>>>>>>>> 
>>>>>>>>       Could you please try your code with it; you should see very very similar convergence history with this branch and the previous approach you used (the only numerical difference between the two approaches is that this one uses a dense LU factorization for each small block while the previous used a sparse factorization). This new one should also be slightly faster in the KSPSolve().
>>>>>>>> 
>>>>>>>>      Thanks
>>>>>>>> 
>>>>>>>>       Barry
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 27, 2018, at 4:04 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>>>>>> 
>>>>>>>>> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
>>>>>>>>> 
>>>>>>>>>>      I have added new functionality within SNES to allow one to do as you desire in the branch barry/feature-snessetkspsetupcallback. Please see src/snes/examples/tests/ex3.c
>>>>>>>>> Note that this example is block Jacobi with O(1) sparse blocks per
>>>>>>>>> process, not variable-sized point-block Jacobi which I think is what Ali
>>>>>>>>> had in mind.
>>>>>>>>> 
>>>>>>>>>>      Please let us know if it works for you and if you have any questions or problems.
>>>>>>>>>> 
>>>>>>>>>>      Barry
>>>>>>>>>> 
>>>>>>>>>>     Note that it has to be handled by a callback called from within the SNES solver because that is the first time the matrix exists in a form that the block sizes may be set.
>>>>>>>>>> 
>>>>>>>>>>      Sorry for the runaround with so many emails.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Aug 24, 2018, at 3:48 PM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I am trying to use block Jacobi preconditioner in SNES (SNESNEWTONLS). However, PCBJacobiGetSubKSP function returns an error stating "Object is in wrong state, Must call KSPSetUp() or PCSetUp() first". When I add KSPSetUp, I got and error from them as: "Must call DMShellSetGlobalVector() or DMShellSetCreateGlobalVector()", and if PCSetUp is added, "Object is in wrong state, Matrix must be set first" error is printed.
>>>>>>>>>>> 
>>>>>>>>>>> Below is a part of my code. It is run serially. Any help is much appreciated.
>>>>>>>>>>> 
>>>>>>>>>>>      ierr = SNESGetKSP(snes, &Petsc_ksp);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = KSPGetPC(Petsc_ksp, &Petsc_pc);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = KSPSetTolerances(Petsc_ksp, 1.e-3, 1e-3, PETSC_DEFAULT, 20000);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = SNESSetTolerances(snes, 1e-1, 1e-1, 1e-1, 2000, 2000);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = SNESSetType(snes, SNESNEWTONLS);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = KSPSetType(Petsc_ksp, KSPGMRES);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = PCBJacobiSetTotalBlocks(Petsc_pc, 2*Nx*Ny*Nz, SadeqSize);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>> 
>>>>>>>>>>>      SNESSetUp(snes);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>>      ierr = PCBJacobiGetSubKSP(Petsc_pc, &nLocal, &Firstly, &subKSP);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>> 
>>>>>>>>>>>      for (i = 0; i < nLocal; i++) {
>>>>>>>>>>>          ierr = KSPGetPC(subKSP[i], &SubPc);
>>>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>>>          ierr = PCSetType(SubPc, PCLU);
>>>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>>>          ierr = PCFactorSetMatSolverPackage(SubPc, "mkl_pardiso");
>>>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>>>          ierr = KSPSetType(subKSP[i], KSPPREONLY);
>>>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>>>          ierr = KSPSetTolerances(subKSP[i], 1.e-6, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
>>>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>>>      }
>>>>>>>>>>>      ierr = SNESSolve(snes, NULL, Petsc_X);
>>>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>> -- 
>>> Ali Reza Khaz’ali
>>> Assistant Professor of Petroleum Engineering,
>>> Department of Chemical Engineering
>>> Isfahan University of Technology
>>> Isfahan, Iran
> 
> -- 
> Ali Reza Khaz’ali
> Assistant Professor of Petroleum Engineering,
> Department of Chemical Engineering
> Isfahan University of Technology
> Isfahan, Iran
>