[petsc-users] Use block Jacobi preconditioner with SNES

Wed Aug 29 14:08:41 CDT 2018

Thanks again,

> You might want to use a memory profiler to see if you have important
> leaks.  See comment below about destroying objects and PETSc memory
> logging.
That's a very good suggestion. I'll do it.

> This matrix takes just a little over 100 MB (only about 35 MB of which
> is needed).  If this code is using a lot more memory than that, you
> should use a memory profiler to find out where (e.g., some data
> structures in your code).
The memory usage was high for the previous (and wrong) one with 
bjacobi/ilu.  My code with vpbjacobi preconditioner only requires 150K, 
which as you stated, most of it is not required by PETSc.

> All of your time is in function and Jacobian evaluation.
> The entire linear solve is less than 50 milliseconds.
Thanks for your comment. You are right. The Jacobian construction 
involves about 40K lines of code, and a lot of phase equilibrium 
calculations, which are very time intensive.

> It looks like you have forgotten to destroy all of your objects.  If you
> clean up after yourself, you'll also see the totaly memory used by each
> of the PETSc classes that you use.
That is because of PetscLogView was placed before the destroy functions. 
I moved it after them, and here is the new log: (additionally, this is 
the log of one time step only, so all of the objects won't be destroyed 
before all time steps have been finished)

SNES Object: 1 MPI processes
   type: newtonls
   maximum iterations=2000, maximum function evaluations=2000
   tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
   total number of linear solver iterations=3
   total number of function evaluations=2
   norm schedule ALWAYS
   SNESLineSearch Object: 1 MPI processes
     type: bt
       interpolation: cubic
       alpha=1.000000e-04
     maxstep=1.000000e+08, minlambda=1.000000e-12
     tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
lambda=1.000000e-08
     maximum iterations=40
   KSP Object: 1 MPI processes
     type: gmres
       restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
       happy breakdown tolerance 1e-30
     maximum iterations=5000, initial guess is zero
     tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
     left preconditioning
     using PRECONDITIONED norm type for convergence test
   PC Object: 1 MPI processes
     type: vpbjacobi
     linear system matrix = precond matrix:
     Mat Object: 1 MPI processes
       type: seqaij
       rows=108000, cols=108000
       total: nonzeros=2868000, allocated nonzeros=8640000
       total number of mallocs used during MatSetValues calls =0
         not using I-node routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe 
on a  named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 23:33:39 2018
Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date: 
2018-08-28 19:19:26 -0500

                          Max       Max/Min     Avg       Total
Time (sec):           1.190e+02     1.000   1.190e+02
Objects:              3.600e+01     1.000   3.600e+01
Flop:                 2.867e+07     1.000   2.867e+07  2.867e+07
Flop/sec:             2.408e+05     1.000   2.408e+05  2.408e+05
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length 
N --> 2N flop
                             and VecAXPY() for complex vectors of length 
N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages 
---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total    Count 
%Total     Avg         %Total    Count   %Total
  0:      Main Stage: 1.1905e+02 100.0%  2.8668e+07 100.0% 0.000e+00   
0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
    Count: number of times phase was executed
    Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    AvgLen: average message length (bytes)
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
       %T - percent time in this phase         %F - percent flop in this 
phase
       %M - percent messages in this phase     %L - percent message 
lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec) 
Flop                              --- Global ---  --- Stage ---- Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen  
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           1 1.0 2.1382e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BuildTwoSidedF         2 1.0 4.2337e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESSolve              1 1.0 1.0010e+02 1.0 2.87e+07 1.0 0.0e+00 0.0e+00 
0.0e+00 84100  0  0  0  84100  0  0  0     0
SNESFunctionEval       2 1.0 5.2605e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
SNESJacobianEval       1 1.0 9.4788e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 80  0  0  0  0  80  0  0  0  0     0
SNESLineSearch         1 1.0 2.6283e+00 1.0 6.82e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  2 24  0  0  0   2 24  0  0  0     3
VecDot                 1 1.0 1.8261e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0  1183
VecMDot                3 1.0 8.1381e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  5  0  0  0   0  5  0  0  0  1592
VecNorm                7 1.0 6.5849e-03 1.0 1.51e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  5  0  0  0   0  5  0  0  0   230
VecScale               4 1.0 1.6208e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   0  2  0  0  0  2665
VecCopy                3 1.0 4.4561e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 2 1.0 1.4369e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                1 1.0 1.5823e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0  1365
VecWAXPY               1 1.0 1.9501e-04 1.0 1.08e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0   554
VecMAXPY               4 1.0 7.7190e-04 1.0 1.94e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0  2518
VecAssemblyBegin       2 1.0 7.9970e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 1.2402e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith         2 1.0 2.8567e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   0  2  0  0  0  1512
VecReduceComm          1 1.0 5.1318e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           4 1.0 4.6357e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  5  0  0  0   0  5  0  0  0  2796
MatMult                4 1.0 1.6927e-02 1.0 2.25e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0 79  0  0  0   0 79  0  0  0  1330
MatAssemblyBegin       2 1.0 8.5529e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.4032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 2.8857e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 3.3998e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 1.2325e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.7389e-02 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0 75  0  0  0   0 75  0  0  0   579
KSPGMRESOrthog         3 1.0 1.3552e-03 1.0 2.59e+06 1.0 0.0e+00 0.0e+00 
0.0e+00  0  9  0  0  0   0  9  0  0  0  1913
PCSetUp                1 1.0 1.4842e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply                4 1.0 3.1607e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

                 SNES     1              1         1372     0.
               DMSNES     1              1          672     0.
       SNESLineSearch     1              1         1000     0.
               Vector    20             20     17313120     0.
               Matrix     1              1    105843212     0.
     Distributed Mesh     2              2         9504     0.
    Star Forest Graph     4              4         3200     0.
      Discrete System     2              2         1856     0.
        Krylov Solver     1              1        18416     0.
      DMKSP interface     1              1          656     0.
       Preconditioner     1              1          832     0.
               Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-ksp_atol 1e-6
-ksp_rtol 1e-5
-snes_rtol 1e-4
-sub_ksp_type preonly
-sub_pc_factor_mat_solver_type mkl_pardiso
-sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/home/alireza/PetscGit 
--with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl 
--with-hypre-incl
ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include 
--with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib 
--with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib 
ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" 
--with-mpi-include=/cygdrive/E/Program_Fi
les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include 
--with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
el64/lib/impi.lib 
--with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe 
--with-debugging=0 --with-blas
-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
--with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
-CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" 
-FFLAGS="-MT -O2 -Qopenmp"
-----------------------------------------
Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
Using PETSc directory: /home/alireza/PetscGit
Using PETSc arch:
-----------------------------------------

Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl 
-O2 -MT -wd4996 -Qopenmp
Using Fortran compiler: 
/home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp 
-fpp
-----------------------------------------

Using include paths: -I/home/alireza/PetscGit/include 
-I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include 
-I/cygdrive/E/hypre-2.11.2/
Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include 
-I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
-----------------------------------------

Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe 
ifort
Using libraries: -L/home/alireza/PetscGit/lib 
-L/home/alireza/PetscGit/lib -lpetsc 
/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib 
/cygdrive/E/Trilinos-master/Bins/lib
/ml.lib 
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib 
Gdi32.lib User32.lib
  Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------

Many thanks
Ali

>
>> ========================================================================================================================
>> Average time to get PetscTime(): 8.55292e-08
>> #PETSc Option Table entries:
>> -ksp_atol 1e-6
>> -ksp_rtol 1e-5
>> -snes_rtol 1e-4
>> -sub_ksp_type preonly
>> -sub_pc_factor_mat_solver_type mkl_pardiso
>> -sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --prefix=/home/alireza/PetscGit
>> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
>> --with-hypre-incl
>> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
>> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
>> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
>> ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
>> --with-mpi-include=/cygdrive/E/Program_Fi
>> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
>> el64/lib/impi.lib
>> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
>> --with-debugging=0 --with-blas
>> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
>> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
>> -FFLAGS="-MT -O2 -Qopenmp"
>> -----------------------------------------
>> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
>> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
>> Using PETSc directory: /home/alireza/PetscGit
>> Using PETSc arch:
>> -----------------------------------------
>>
>> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>> -O2 -MT -wd4996 -Qopenmp
>> Using Fortran compiler:
>> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
>> -fpp
>> -----------------------------------------
>>
>> Using include paths: -I/home/alireza/PetscGit/include
>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
>> -I/cygdrive/E/hypre-2.11.2/
>> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>> -----------------------------------------
>>
>> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
>> ifort
>> Using libraries: -L/home/alireza/PetscGit/lib
>> -L/home/alireza/PetscGit/lib -lpetsc
>> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>> /cygdrive/E/Trilinos-master/Bins/lib
>> /ml.lib
>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
>> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
>> Gdi32.lib User32.lib
>>    Advapi32.lib Kernel32.lib Ws2_32.lib
>> -----------------------------------------
>>
>>
>>
>>
>>
>> On 8/29/2018 10:29 PM, Jed Brown wrote:
>>> This is bjacobi/lu, not vpbjacobi.
>>>
>>> How are you measuring memory usage?  See a couple inline comments below.
>>>
>>> Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> writes:
>>>
>>>> I noticed that I forgot to include the log for 10x10x5 system. Here it is:
>>>>
>>>>
>>>> 1 MPI processes
>>>>      type: newtonls
>>>>      maximum iterations=2000, maximum function evaluations=2000
>>>>      tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
>>>>      total number of linear solver iterations=1
>>>>      total number of function evaluations=2
>>>>      norm schedule ALWAYS
>>>>      SNESLineSearch Object: 1 MPI processes
>>>>        type: bt
>>>>          interpolation: cubic
>>>>          alpha=1.000000e-04
>>>>        maxstep=1.000000e+08, minlambda=1.000000e-12
>>>>        tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>>> lambda=1.000000e-08
>>>>        maximum iterations=40
>>>>      KSP Object: 1 MPI processes
>>>>        type: gmres
>>>>          restart=30, using Classical (unmodified) Gram-Schmidt
>>>> Orthogonalization with no iterative refinement
>>>>          happy breakdown tolerance 1e-30
>>>>        maximum iterations=5000, initial guess is zero
>>>>        tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
>>>>        left preconditioning
>>>>        using PRECONDITIONED norm type for convergence test
>>>>      PC Object: 1 MPI processes
>>>>        type: bjacobi
>>>>          number of blocks = 1
>>>>          Local solve is same for all blocks, in the following KSP and PC
>>>> objects:
>>>>          KSP Object: (sub_) 1 MPI processes
>>>>            type: preonly
>>>>            maximum iterations=10000, initial guess is zero
>>>>            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>            left preconditioning
>>>>            using NONE norm type for convergence test
>>>>          PC Object: (sub_) 1 MPI processes
>>>>            type: lu
>>>>              out-of-place factorization
>>>>              tolerance for zero pivot 2.22045e-14
>>>>              matrix ordering: nd
>>>>              factor fill ratio given 0., needed 0.
>>>>                Factored matrix follows:
>>>>                  Mat Object: 1 MPI processes
>>>>                    type: mkl_pardiso
>>>>                    rows=6000, cols=6000
>>>>                    package used to perform factorization: mkl_pardiso
>>>>                    total: nonzeros=150000, allocated nonzeros=150000
>>>>                    total number of mallocs used during MatSetValues calls =0
>>>>                      MKL_PARDISO run parameters:
>>>>                      MKL_PARDISO phase:             33
>>>>                      MKL_PARDISO iparm[1]:     1
>>>>                      MKL_PARDISO iparm[2]:     2
>>>>                      MKL_PARDISO iparm[3]:     2
>>>>                      MKL_PARDISO iparm[4]:     0
>>>>                      MKL_PARDISO iparm[5]:     0
>>>>                      MKL_PARDISO iparm[6]:     0
>>>>                      MKL_PARDISO iparm[7]:     0
>>>>                      MKL_PARDISO iparm[8]:     0
>>>>                      MKL_PARDISO iparm[9]:     0
>>>>                      MKL_PARDISO iparm[10]:     13
>>>>                      MKL_PARDISO iparm[11]:     1
>>>>                      MKL_PARDISO iparm[12]:     0
>>>>                      MKL_PARDISO iparm[13]:     1
>>>>                      MKL_PARDISO iparm[14]:     0
>>>>                      MKL_PARDISO iparm[15]:     11228
>>>>                      MKL_PARDISO iparm[16]:     7563
>>>>                      MKL_PARDISO iparm[17]:     37035
>>>>                      MKL_PARDISO iparm[18]:     4204238
>>>>                      MKL_PARDISO iparm[19]:     2659
>>>>                      MKL_PARDISO iparm[20]:     0
>>>>                      MKL_PARDISO iparm[21]:     0
>>>>                      MKL_PARDISO iparm[22]:     0
>>>>                      MKL_PARDISO iparm[23]:     0
>>>>                      MKL_PARDISO iparm[24]:     0
>>>>                      MKL_PARDISO iparm[25]:     0
>>>>                      MKL_PARDISO iparm[26]:     0
>>>>                      MKL_PARDISO iparm[27]:     0
>>>>                      MKL_PARDISO iparm[28]:     0
>>>>                      MKL_PARDISO iparm[29]:     0
>>>>                      MKL_PARDISO iparm[30]:     0
>>>>                      MKL_PARDISO iparm[31]:     0
>>>>                      MKL_PARDISO iparm[32]:     0
>>>>                      MKL_PARDISO iparm[33]:     0
>>>>                      MKL_PARDISO iparm[34]:     -1
>>>>                      MKL_PARDISO iparm[35]:     1
>>>>                      MKL_PARDISO iparm[36]:     0
>>>>                      MKL_PARDISO iparm[37]:     0
>>>>                      MKL_PARDISO iparm[38]:     0
>>>>                      MKL_PARDISO iparm[39]:     0
>>>>                      MKL_PARDISO iparm[40]:     0
>>>>                      MKL_PARDISO iparm[41]:     0
>>>>                      MKL_PARDISO iparm[42]:     0
>>>>                      MKL_PARDISO iparm[43]:     0
>>>>                      MKL_PARDISO iparm[44]:     0
>>>>                      MKL_PARDISO iparm[45]:     0
>>>>                      MKL_PARDISO iparm[46]:     0
>>>>                      MKL_PARDISO iparm[47]:     0
>>>>                      MKL_PARDISO iparm[48]:     0
>>>>                      MKL_PARDISO iparm[49]:     0
>>>>                      MKL_PARDISO iparm[50]:     0
>>>>                      MKL_PARDISO iparm[51]:     0
>>>>                      MKL_PARDISO iparm[52]:     0
>>>>                      MKL_PARDISO iparm[53]:     0
>>>>                      MKL_PARDISO iparm[54]:     0
>>>>                      MKL_PARDISO iparm[55]:     0
>>>>                      MKL_PARDISO iparm[56]:     0
>>>>                      MKL_PARDISO iparm[57]:     -1
>>>>                      MKL_PARDISO iparm[58]:     0
>>>>                      MKL_PARDISO iparm[59]:     0
>>>>                      MKL_PARDISO iparm[60]:     0
>>>>                      MKL_PARDISO iparm[61]:     11228
>>>>                      MKL_PARDISO iparm[62]:     7868
>>>>                      MKL_PARDISO iparm[63]:     3629
>>>>                      MKL_PARDISO iparm[64]:     0
>>>>                      MKL_PARDISO maxfct:     1
>>>>                      MKL_PARDISO mnum:     1
>>>>                      MKL_PARDISO mtype:     11
>>>>                      MKL_PARDISO n:     6000
>>>>                      MKL_PARDISO nrhs:     1
>>>>                      MKL_PARDISO msglvl:     0
>>>>            linear system matrix = precond matrix:
>>>>            Mat Object: 1 MPI processes
>>>>              type: seqaij
>>>>              rows=6000, cols=6000
>>>>              total: nonzeros=150000, allocated nonzeros=480000
>>>>              total number of mallocs used during MatSetValues calls =0
>>>>                not using I-node routines
>>>>        linear system matrix = precond matrix:
>>>>        Mat Object: 1 MPI processes
>>>>          type: seqaij
>>>>          rows=6000, cols=6000
>>>>          total: nonzeros=150000, allocated nonzeros=480000
>>> Note that you preallocated more than necessary here.
>>>
>>>>          total number of mallocs used during MatSetValues calls =0
>>>>            not using I-node routines
>>>> ************************************************************************************************************************
>>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>>>> -fCourier9' to print this document            ***
>>>> ************************************************************************************************************************
>>>>
>>>> ---------------------------------------------- PETSc Performance
>>>> Summary: ----------------------------------------------
>>>>
>>>> E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe
>>>> on a  named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 22:12:30 2018
>>>> Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date:
>>>> 2018-08-28 19:19:26 -0500
>>>>
>>>>                             Max       Max/Min     Avg       Total
>>>> Time (sec):           3.373e+01     1.000   3.373e+01
>>>> Objects:              3.300e+01     1.000   3.300e+01
>>>> Flop:                 7.500e+05     1.000   7.500e+05  7.500e+05
>>>> Flop/sec:             2.224e+04     1.000   2.224e+04  2.224e+04
>>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>>> MPI Reductions:       0.000e+00     0.000
>>>>
>>>> Flop counting convention: 1 flop = 1 real number operation of type
>>>> (multiply/divide/add/subtract)
>>>>                                e.g., VecAXPY() for real vectors of length
>>>> N --> 2N flop
>>>>                                and VecAXPY() for complex vectors of length
>>>> N --> 8N flop
>>>>
>>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>>>> ---  -- Message Lengths --  -- Reductions --
>>>>                            Avg     %Total     Avg     %Total Count
>>>> %Total     Avg         %Total    Count   %Total
>>>>     0:      Main Stage: 3.3730e+01 100.0%  7.5000e+05 100.0% 0.000e+00
>>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>>
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> See the 'Profiling' chapter of the users' manual for details on
>>>> interpreting output.
>>>> Phase summary info:
>>>>       Count: number of times phase was executed
>>>>       Time and Flop: Max - maximum over all processors
>>>>                      Ratio - ratio of maximum to minimum over all processors
>>>>       Mess: number of messages sent
>>>>       AvgLen: average message length (bytes)
>>>>       Reduct: number of global reductions
>>>>       Global: entire computation
>>>>       Stage: stages of a computation. Set stages with PetscLogStagePush()
>>>> and PetscLogStagePop().
>>>>          %T - percent time in this phase         %F - percent flop in this
>>>> phase
>>>>          %M - percent messages in this phase     %L - percent message
>>>> lengths in this phase
>>>>          %R - percent reductions in this phase
>>>>       Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>>>> over all processors)
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> Event                Count      Time (sec)
>>>> Flop                              --- Global ---  --- Stage ---- Total
>>>>                       Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen
>>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> --- Event Stage 0: Main Stage
>>>>
>>>> BuildTwoSidedF         2 1.0 7.9003e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> SNESSolve              1 1.0 7.3396e+00 1.0 7.50e+05 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00 22100  0  0  0  22100  0  0  0     0
>>>> SNESFunctionEval       2 1.0 3.8894e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> SNESJacobianEval       1 1.0 5.1611e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00 15  0  0  0  0  15  0  0  0  0     0
>>> Most of the time here is spent in your code computing the Jacobian, not
>>> in the solver.
>>>
>>>> SNESLineSearch         1 1.0 1.7655e-02 1.0 3.60e+05 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0 48  0  0  0   0 48  0  0  0    20
>>>> VecDot                 1 1.0 1.1546e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0  1039
>>>> VecMDot                1 1.0 2.3093e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   520
>>>> VecNorm                5 1.0 4.9769e-01 1.0 6.00e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  1  8  0  0  0   1  8  0  0  0     0
>>>> VecScale               2 1.0 9.3291e-03 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0     1
>>>> VecCopy                3 1.0 2.3093e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecSet                 4 1.0 1.5395e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecAXPY                1 1.0 2.5734e-02 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0     0
>>>> VecWAXPY               1 1.0 1.3257e-05 1.0 6.00e+03 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   453
>>>> VecMAXPY               2 1.0 2.3521e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  1020
>>>> VecAssemblyBegin       2 1.0 7.9040e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecAssemblyEnd         2 1.0 1.7961e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecReduceArith         2 1.0 9.8359e-06 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  2440
>>>> VecReduceComm          1 1.0 4.7041e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecNormalize           2 1.0 9.3552e-03 1.0 3.60e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  5  0  0  0   0  5  0  0  0     4
>>>> MatMult                2 1.0 4.4860e-04 1.0 5.88e+05 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0 78  0  0  0   0 78  0  0  0  1311
>>>> MatSolve               2 1.0 1.7876e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> MatLUFactorSym         1 1.0 7.6157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>> MatLUFactorNum         1 1.0 6.5977e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>>> MatAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatAssemblyEnd         2 1.0 1.2962e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatGetRowIJ            1 1.0 1.1534e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatGetOrdering         1 1.0 5.1809e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatZeroEntries         1 1.0 1.1298e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatView                3 1.0 3.8344e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> KSPSetUp               2 1.0 3.2843e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> KSPSolve               1 1.0 1.6414e+00 1.0 3.78e+05 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  5 50  0  0  0   5 50  0  0  0     0
>>>> KSPGMRESOrthog         1 1.0 5.9015e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0   407
>>>> PCSetUp                2 1.0 1.4267e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>>> PCSetUpOnBlocks        1 1.0 1.4266e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>>> PCApply                2 1.0 1.7882e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> Memory usage is given in bytes:
>>>>
>>>> Object Type          Creations   Destructions     Memory Descendants' Mem.
>>>> Reports information only for process 0.
>>>>
>>>> --- Event Stage 0: Main Stage
>>>>
>>>>                    SNES     1              0            0     0.
>>>>                  DMSNES     1              0            0     0.
>>>>          SNESLineSearch     1              0            0     0.
>>>>                  Vector    12              0            0     0.
>>>>                  Matrix     2              0            0     0.
>>>>        Distributed Mesh     2              0            0     0.
>>>>               Index Set     2              0            0     0.
>>>>       Star Forest Graph     4              0            0     0.
>>>>         Discrete System     2              0            0     0.
>>>>           Krylov Solver     2              0            0     0.
>>>>         DMKSP interface     1              0            0     0.
>>>>          Preconditioner     2              0            0     0.
>>>>                  Viewer     1              0            0     0.
>>>> ========================================================================================================================
>>>> Average time to get PetscTime(): 4.27647e-08
>>>> #PETSc Option Table entries:
>>>> -ksp_atol 1e-6
>>>> -ksp_rtol 1e-5
>>>> -snes_rtol 1e-4
>>>> -sub_ksp_type preonly
>>>> -sub_pc_factor_mat_solver_type mkl_pardiso
>>>> -sub_pc_type lu
>>>> #End of PETSc Option Table entries
>>>> Compiled without FORTRAN kernels
>>>> Compiled with full precision matrices (default)
>>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
>>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>>> Configure options: --prefix=/home/alireza/PetscGit
>>>> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
>>>> --with-hypre-incl
>>>> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
>>>> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>>> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
>>>> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
>>>> ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
>>>> --with-mpi-include=/cygdrive/E/Program_Fi
>>>> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>>> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
>>>> el64/lib/impi.lib
>>>> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
>>>> --with-debugging=0 --with-blas
>>>> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
>>>> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
>>>> -FFLAGS="-MT -O2 -Qopenmp"
>>>> -----------------------------------------
>>>> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
>>>> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
>>>> Using PETSc directory: /home/alireza/PetscGit
>>>> Using PETSc arch:
>>>> -----------------------------------------
>>>>
>>>> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>>> -O2 -MT -wd4996 -Qopenmp
>>>> Using Fortran compiler:
>>>> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
>>>> -fpp
>>>> -----------------------------------------
>>>>
>>>> Using include paths: -I/home/alireza/PetscGit/include
>>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
>>>> -I/cygdrive/E/hypre-2.11.2/
>>>> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
>>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>>> -----------------------------------------
>>>>
>>>> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>>> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
>>>> ifort
>>>> Using libraries: -L/home/alireza/PetscGit/lib
>>>> -L/home/alireza/PetscGit/lib -lpetsc
>>>> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>>> /cygdrive/E/Trilinos-master/Bins/lib
>>>> /ml.lib
>>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
>>>> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
>>>> Gdi32.lib User32.lib
>>>>     Advapi32.lib Kernel32.lib Ws2_32.lib
>>>> -----------------------------------------
>>>>
>>>>
>>>>
>>>> On 8/29/2018 7:40 PM, Smith, Barry F. wrote:
>>>>>       You need to add the line
>>>>>
>>>>>      ierr = MatSetVariableBlockSizes(J,3,lens);CHKERRQ(ierr);
>>>>>
>>>>> with the number of blocks replacing 3 and the sizes of each block in lens
>>>>>
>>>>>       Barry
>>>>>
>>>>>
>>>>>> On Aug 29, 2018, at 4:27 AM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>>
>>>>>> Barry,
>>>>>>
>>>>>> Thanks a lot for your efforts. Using PCVPBJACOBI causes SNESSolve to throw the following error (the code and the system being simulated is the same, just PCBJACOBI changed to PCVPBJACOBI):
>>>>>>
>>>>>>
>>>>>> [0]PETSC ERROR: --------------------- Error Message ---------------------------------------
>>>>>> -----------------------
>>>>>> [0]PETSC ERROR: Nonconforming object sizes
>>>>>> [0]PETSC ERROR: Total blocksizes 0 doesn't match number matrix rows 108000
>>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date: 2018-08-28 19:19:26 -0500
>>>>>> [0]PETSC ERROR: E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe on a  named ALIREZA-PC by AliReza Wed Aug 29 13:46:38 2018
>>>>>> [0]PETSC ERROR: Configure options --prefix=/home/alireza/PetscGit --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl --
>>>>>> with-hypre-include=/cygdrive/E/hypre-2.11.2/Builds/Bins/include --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib --with-ml-include=/cygdrive/E/Trilinos
>>>>>> -master/Bins/include --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" --with-mpi-include=/cygdri
>>>>>> ve/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/
>>>>>> windows/mpi/intel64/lib/impi.lib --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe --with-debuggin
>>>>>> g=0 --with-blas-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib --with-lapack-lib=/cygdrive/E/Program_Files_
>>>>>> x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" -FFLAGS="-MT -O2 -
>>>>>> Qopenmp"
>>>>>> [0]PETSC ERROR: #1 MatInvertVariableBlockDiagonal_SeqAIJ() line 1569 in E:\cygwin64\home\alireza\PETSc\src\mat\impls\aij\seq\aij.c
>>>>>> [0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal() line 10482 in E:\cygwin64\home\alireza\PETSc\src\mat\interface\matrix.c
>>>>>> [0]PETSC ERROR: #3 PCSetUp_VPBJacobi() line 120 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\impls\vpbjacobi\vpbjacobi.c
>>>>>> [0]PETSC ERROR: #4 PCSetUp() line 932 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\interface\precon.c
>>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 381 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>>>> [0]PETSC ERROR: #6 KSPSolve() line 612 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>>>> [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 224 in E:\cygwin64\home\alireza\PETSc\src\snes\impls\ls\ls.c
>>>>>> [0]PETSC ERROR: #8 SNESSolve() line 4355 in E:\cygwin64\home\alireza\PETSc\src\snes\interface\snes.c
>>>>>>
>>>>>> Many thanks,
>>>>>> Ali
>>>>>>
>>>>>> On 8/29/2018 2:24 AM, Smith, Barry F. wrote:
>>>>>>>      Ali,
>>>>>>>
>>>>>>>        In the branch barry/feature-PCVPBJACOBI (see src/snes/examples/tutorials/ex5.c) I have implemented PCVPBJACOBI that is a point block Jacobi with variable size blocks. It has very little testing.
>>>>>>>
>>>>>>>        Could you please try your code with it; you should see very very similar convergence history with this branch and the previous approach you used (the only numerical difference between the two approaches is that this one uses a dense LU factorization for each small block while the previous used a sparse factorization). This new one should also be slightly faster in the KSPSolve().
>>>>>>>
>>>>>>>       Thanks
>>>>>>>
>>>>>>>        Barry
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 27, 2018, at 4:04 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>>>>>
>>>>>>>> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
>>>>>>>>
>>>>>>>>>       I have added new functionality within SNES to allow one to do as you desire in the branch barry/feature-snessetkspsetupcallback. Please see src/snes/examples/tests/ex3.c
>>>>>>>> Note that this example is block Jacobi with O(1) sparse blocks per
>>>>>>>> process, not variable-sized point-block Jacobi which I think is what Ali
>>>>>>>> had in mind.
>>>>>>>>
>>>>>>>>>       Please let us know if it works for you and if you have any questions or problems.
>>>>>>>>>
>>>>>>>>>       Barry
>>>>>>>>>
>>>>>>>>>      Note that it has to be handled by a callback called from within the SNES solver because that is the first time the matrix exists in a form that the block sizes may be set.
>>>>>>>>>
>>>>>>>>>       Sorry for the runaround with so many emails.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 24, 2018, at 3:48 PM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am trying to use block Jacobi preconditioner in SNES (SNESNEWTONLS). However, PCBJacobiGetSubKSP function returns an error stating "Object is in wrong state, Must call KSPSetUp() or PCSetUp() first". When I add KSPSetUp, I got and error from them as: "Must call DMShellSetGlobalVector() or DMShellSetCreateGlobalVector()", and if PCSetUp is added, "Object is in wrong state, Matrix must be set first" error is printed.
>>>>>>>>>>
>>>>>>>>>> Below is a part of my code. It is run serially. Any help is much appreciated.
>>>>>>>>>>
>>>>>>>>>>       ierr = SNESGetKSP(snes, &Petsc_ksp);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = KSPGetPC(Petsc_ksp, &Petsc_pc);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = KSPSetTolerances(Petsc_ksp, 1.e-3, 1e-3, PETSC_DEFAULT, 20000);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = SNESSetTolerances(snes, 1e-1, 1e-1, 1e-1, 2000, 2000);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = SNESSetType(snes, SNESNEWTONLS);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = KSPSetType(Petsc_ksp, KSPGMRES);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = PCBJacobiSetTotalBlocks(Petsc_pc, 2*Nx*Ny*Nz, SadeqSize);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>
>>>>>>>>>>       SNESSetUp(snes);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>       ierr = PCBJacobiGetSubKSP(Petsc_pc, &nLocal, &Firstly, &subKSP);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>
>>>>>>>>>>       for (i = 0; i < nLocal; i++) {
>>>>>>>>>>           ierr = KSPGetPC(subKSP[i], &SubPc);
>>>>>>>>>>           CHKERRQ(ierr);
>>>>>>>>>>           ierr = PCSetType(SubPc, PCLU);
>>>>>>>>>>           CHKERRQ(ierr);
>>>>>>>>>>           ierr = PCFactorSetMatSolverPackage(SubPc, "mkl_pardiso");
>>>>>>>>>>           CHKERRQ(ierr);
>>>>>>>>>>           ierr = KSPSetType(subKSP[i], KSPPREONLY);
>>>>>>>>>>           CHKERRQ(ierr);
>>>>>>>>>>           ierr = KSPSetTolerances(subKSP[i], 1.e-6, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
>>>>>>>>>>           CHKERRQ(ierr);
>>>>>>>>>>       }
>>>>>>>>>>       ierr = SNESSolve(snes, NULL, Petsc_X);
>>>>>>>>>>       CHKERRQ(ierr);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>> -- 
>> Ali Reza Khaz’ali
>> Assistant Professor of Petroleum Engineering,
>> Department of Chemical Engineering
>> Isfahan University of Technology
>> Isfahan, Iran

-- 
Ali Reza Khaz’ali
Assistant Professor of Petroleum Engineering,
Department of Chemical Engineering
Isfahan University of Technology
Isfahan, Iran