[petsc-users] Use block Jacobi preconditioner with SNES

Jed Brown jed at jedbrown.org
Wed Aug 29 13:35:45 CDT 2018


Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> writes:

> Oh I'm sorry. There have been a lot of changes in my code, and I lost 
> track of them. I've fixed the code, and the log for 30x30x10 system is 
> as follows. I measure the memory usage with windows task manager. Both 
> of your comment are true. This is just a test run, and I'll optimize it 
> ASAP.

You might want to use a memory profiler to see if you have important
leaks.  See comment below about destroying objects and PETSc memory
logging.

>
> 1 MPI processes
>    type: newtonls
>    maximum iterations=2000, maximum function evaluations=2000
>    tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
>    total number of linear solver iterations=3
>    total number of function evaluations=2
>    norm schedule ALWAYS
>    SNESLineSearch Object: 1 MPI processes
>      type: bt
>        interpolation: cubic
>        alpha=1.000000e-04
>      maxstep=1.000000e+08, minlambda=1.000000e-12
>      tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
> lambda=1.000000e-08
>      maximum iterations=40
>    KSP Object: 1 MPI processes
>      type: gmres
>        restart=30, using Classical (unmodified) Gram-Schmidt 
> Orthogonalization with no iterative refinement
>        happy breakdown tolerance 1e-30
>      maximum iterations=5000, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
>      left preconditioning
>      using PRECONDITIONED norm type for convergence test
>    PC Object: 1 MPI processes
>      type: vpbjacobi
>      linear system matrix = precond matrix:
>      Mat Object: 1 MPI processes
>        type: seqaij
>        rows=108000, cols=108000
>        total: nonzeros=2868000, allocated nonzeros=8640000

This matrix takes just a little over 100 MB (only about 35 MB of which
is needed).  If this code is using a lot more memory than that, you
should use a memory profiler to find out where (e.g., some data
structures in your code).

>        total number of mallocs used during MatSetValues calls =0
>          not using I-node routines
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
> -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance 
> Summary: ----------------------------------------------
>
> E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe 
> on a  named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 22:43:35 2018
> Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date: 
> 2018-08-28 19:19:26 -0500
>
>                           Max       Max/Min     Avg       Total
> Time (sec):           1.187e+02     1.000   1.187e+02
> Objects:              3.600e+01     1.000   3.600e+01
> Flop:                 2.867e+07     1.000   2.867e+07  2.867e+07
> Flop/sec:             2.414e+05     1.000   2.414e+05  2.414e+05
> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00     0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type 
> (multiply/divide/add/subtract)
>                              e.g., VecAXPY() for real vectors of length 
> N --> 2N flop
>                              and VecAXPY() for complex vectors of length 
> N --> 8N flop
>
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages 
> ---  -- Message Lengths --  -- Reductions --
>                          Avg     %Total     Avg     %Total Count   
> %Total     Avg         %Total    Count   %Total
>   0:      Main Stage: 1.1875e+02 100.0%  2.8668e+07 100.0% 0.000e+00   
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on 
> interpreting output.
> Phase summary info:
>     Count: number of times phase was executed
>     Time and Flop: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>     Mess: number of messages sent
>     AvgLen: average message length (bytes)
>     Reduct: number of global reductions
>     Global: entire computation
>     Stage: stages of a computation. Set stages with PetscLogStagePush() 
> and PetscLogStagePop().
>        %T - percent time in this phase         %F - percent flop in this 
> phase
>        %M - percent messages in this phase     %L - percent message 
> lengths in this phase
>        %R - percent reductions in this phase
>     Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time 
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec) 
> Flop                              --- Global ---  --- Stage ---- Total
>                     Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen  
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSidedF         2 1.0 2.7369e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SNESSolve              1 1.0 9.8642e+01 1.0 2.87e+07 1.0 0.0e+00 0.0e+00 
> 0.0e+00 83100  0  0  0  83100  0  0  0     0
> SNESFunctionEval       2 1.0 5.4137e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
> SNESJacobianEval       1 1.0 9.3167e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00 78  0  0  0  0  78  0  0  0  0     0

All of your time is in function and Jacobian evaluation.

> SNESLineSearch         1 1.0 2.6980e+00 1.0 6.82e+06 1.0 0.0e+00 0.0e+00 
> 0.0e+00  2 24  0  0  0   2 24  0  0  0     3
> VecDot                 1 1.0 1.7705e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  1220
> VecMDot                3 1.0 7.4411e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  5  0  0  0   0  5  0  0  0  1742
> VecNorm                7 1.0 5.9443e-03 1.0 1.51e+06 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  5  0  0  0   0  5  0  0  0   254
> VecScale               4 1.0 1.4882e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  2  0  0  0   0  2  0  0  0  2903
> VecCopy                3 1.0 4.6827e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 2 1.0 1.3899e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY                1 1.0 1.5652e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  1380
> VecWAXPY               1 1.0 1.7448e-04 1.0 1.08e+05 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   619
> VecMAXPY               4 1.0 7.7233e-04 1.0 1.94e+06 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  7  0  0  0   0  7  0  0  0  2517
> VecAssemblyBegin       2 1.0 4.6614e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         2 1.0 7.2700e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecReduceArith         2 1.0 2.3948e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  2  0  0  0   0  2  0  0  0  1804
> VecReduceComm          1 1.0 2.1382e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize           4 1.0 4.1995e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  5  0  0  0   0  5  0  0  0  3086
> MatMult                4 1.0 1.6703e-02 1.0 2.25e+07 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0 79  0  0  0   0 79  0  0  0  1348
> MatAssemblyBegin       2 1.0 1.2829e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         2 1.0 2.6648e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         1 1.0 5.4123e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatView                1 1.0 6.0384e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 4.3723e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 4.9647e-02 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0 75  0  0  0   0 75  0  0  0   436

The entire linear solve is less than 50 milliseconds.

> KSPGMRESOrthog         3 1.0 1.2919e-03 1.0 2.59e+06 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  9  0  0  0   0  9  0  0  0  2006
> PCSetUp                1 1.0 1.8846e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply                4 1.0 2.9807e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>                  SNES     1              0            0     0.
>                DMSNES     1              0            0     0.
>        SNESLineSearch     1              0            0     0.
>                Vector    20              0            0     0.
>                Matrix     1              0            0     0.
>      Distributed Mesh     2              0            0     0.
>     Star Forest Graph     4              0            0     0.
>       Discrete System     2              0            0     0.
>         Krylov Solver     1              0            0     0.
>       DMKSP interface     1              0            0     0.
>        Preconditioner     1              0            0     0.
>                Viewer     1              0            0     0.

It looks like you have forgotten to destroy all of your objects.  If you
clean up after yourself, you'll also see the totaly memory used by each
of the PETSc classes that you use.

> ========================================================================================================================
> Average time to get PetscTime(): 8.55292e-08
> #PETSc Option Table entries:
> -ksp_atol 1e-6
> -ksp_rtol 1e-5
> -snes_rtol 1e-4
> -sub_ksp_type preonly
> -sub_pc_factor_mat_solver_type mkl_pardiso
> -sub_pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --prefix=/home/alireza/PetscGit 
> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl 
> --with-hypre-incl
> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include 
> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib 
> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib 
> ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" 
> --with-mpi-include=/cygdrive/E/Program_Fi
> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include 
> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
> el64/lib/impi.lib 
> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe 
> --with-debugging=0 --with-blas
> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" 
> -FFLAGS="-MT -O2 -Qopenmp"
> -----------------------------------------
> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
> Using PETSc directory: /home/alireza/PetscGit
> Using PETSc arch:
> -----------------------------------------
>
> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl 
> -O2 -MT -wd4996 -Qopenmp
> Using Fortran compiler: 
> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp 
> -fpp
> -----------------------------------------
>
> Using include paths: -I/home/alireza/PetscGit/include 
> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include 
> -I/cygdrive/E/hypre-2.11.2/
> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include 
> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
> -----------------------------------------
>
> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe 
> ifort
> Using libraries: -L/home/alireza/PetscGit/lib 
> -L/home/alireza/PetscGit/lib -lpetsc 
> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib 
> /cygdrive/E/Trilinos-master/Bins/lib
> /ml.lib 
> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib 
> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib 
> Gdi32.lib User32.lib
>   Advapi32.lib Kernel32.lib Ws2_32.lib
> -----------------------------------------
>
>
>
>
>
> On 8/29/2018 10:29 PM, Jed Brown wrote:
>> This is bjacobi/lu, not vpbjacobi.
>>
>> How are you measuring memory usage?  See a couple inline comments below.
>>
>> Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> writes:
>>
>>> I noticed that I forgot to include the log for 10x10x5 system. Here it is:
>>>
>>>
>>> 1 MPI processes
>>>     type: newtonls
>>>     maximum iterations=2000, maximum function evaluations=2000
>>>     tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
>>>     total number of linear solver iterations=1
>>>     total number of function evaluations=2
>>>     norm schedule ALWAYS
>>>     SNESLineSearch Object: 1 MPI processes
>>>       type: bt
>>>         interpolation: cubic
>>>         alpha=1.000000e-04
>>>       maxstep=1.000000e+08, minlambda=1.000000e-12
>>>       tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>> lambda=1.000000e-08
>>>       maximum iterations=40
>>>     KSP Object: 1 MPI processes
>>>       type: gmres
>>>         restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>         happy breakdown tolerance 1e-30
>>>       maximum iterations=5000, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-06, divergence=10000.
>>>       left preconditioning
>>>       using PRECONDITIONED norm type for convergence test
>>>     PC Object: 1 MPI processes
>>>       type: bjacobi
>>>         number of blocks = 1
>>>         Local solve is same for all blocks, in the following KSP and PC
>>> objects:
>>>         KSP Object: (sub_) 1 MPI processes
>>>           type: preonly
>>>           maximum iterations=10000, initial guess is zero
>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>           left preconditioning
>>>           using NONE norm type for convergence test
>>>         PC Object: (sub_) 1 MPI processes
>>>           type: lu
>>>             out-of-place factorization
>>>             tolerance for zero pivot 2.22045e-14
>>>             matrix ordering: nd
>>>             factor fill ratio given 0., needed 0.
>>>               Factored matrix follows:
>>>                 Mat Object: 1 MPI processes
>>>                   type: mkl_pardiso
>>>                   rows=6000, cols=6000
>>>                   package used to perform factorization: mkl_pardiso
>>>                   total: nonzeros=150000, allocated nonzeros=150000
>>>                   total number of mallocs used during MatSetValues calls =0
>>>                     MKL_PARDISO run parameters:
>>>                     MKL_PARDISO phase:             33
>>>                     MKL_PARDISO iparm[1]:     1
>>>                     MKL_PARDISO iparm[2]:     2
>>>                     MKL_PARDISO iparm[3]:     2
>>>                     MKL_PARDISO iparm[4]:     0
>>>                     MKL_PARDISO iparm[5]:     0
>>>                     MKL_PARDISO iparm[6]:     0
>>>                     MKL_PARDISO iparm[7]:     0
>>>                     MKL_PARDISO iparm[8]:     0
>>>                     MKL_PARDISO iparm[9]:     0
>>>                     MKL_PARDISO iparm[10]:     13
>>>                     MKL_PARDISO iparm[11]:     1
>>>                     MKL_PARDISO iparm[12]:     0
>>>                     MKL_PARDISO iparm[13]:     1
>>>                     MKL_PARDISO iparm[14]:     0
>>>                     MKL_PARDISO iparm[15]:     11228
>>>                     MKL_PARDISO iparm[16]:     7563
>>>                     MKL_PARDISO iparm[17]:     37035
>>>                     MKL_PARDISO iparm[18]:     4204238
>>>                     MKL_PARDISO iparm[19]:     2659
>>>                     MKL_PARDISO iparm[20]:     0
>>>                     MKL_PARDISO iparm[21]:     0
>>>                     MKL_PARDISO iparm[22]:     0
>>>                     MKL_PARDISO iparm[23]:     0
>>>                     MKL_PARDISO iparm[24]:     0
>>>                     MKL_PARDISO iparm[25]:     0
>>>                     MKL_PARDISO iparm[26]:     0
>>>                     MKL_PARDISO iparm[27]:     0
>>>                     MKL_PARDISO iparm[28]:     0
>>>                     MKL_PARDISO iparm[29]:     0
>>>                     MKL_PARDISO iparm[30]:     0
>>>                     MKL_PARDISO iparm[31]:     0
>>>                     MKL_PARDISO iparm[32]:     0
>>>                     MKL_PARDISO iparm[33]:     0
>>>                     MKL_PARDISO iparm[34]:     -1
>>>                     MKL_PARDISO iparm[35]:     1
>>>                     MKL_PARDISO iparm[36]:     0
>>>                     MKL_PARDISO iparm[37]:     0
>>>                     MKL_PARDISO iparm[38]:     0
>>>                     MKL_PARDISO iparm[39]:     0
>>>                     MKL_PARDISO iparm[40]:     0
>>>                     MKL_PARDISO iparm[41]:     0
>>>                     MKL_PARDISO iparm[42]:     0
>>>                     MKL_PARDISO iparm[43]:     0
>>>                     MKL_PARDISO iparm[44]:     0
>>>                     MKL_PARDISO iparm[45]:     0
>>>                     MKL_PARDISO iparm[46]:     0
>>>                     MKL_PARDISO iparm[47]:     0
>>>                     MKL_PARDISO iparm[48]:     0
>>>                     MKL_PARDISO iparm[49]:     0
>>>                     MKL_PARDISO iparm[50]:     0
>>>                     MKL_PARDISO iparm[51]:     0
>>>                     MKL_PARDISO iparm[52]:     0
>>>                     MKL_PARDISO iparm[53]:     0
>>>                     MKL_PARDISO iparm[54]:     0
>>>                     MKL_PARDISO iparm[55]:     0
>>>                     MKL_PARDISO iparm[56]:     0
>>>                     MKL_PARDISO iparm[57]:     -1
>>>                     MKL_PARDISO iparm[58]:     0
>>>                     MKL_PARDISO iparm[59]:     0
>>>                     MKL_PARDISO iparm[60]:     0
>>>                     MKL_PARDISO iparm[61]:     11228
>>>                     MKL_PARDISO iparm[62]:     7868
>>>                     MKL_PARDISO iparm[63]:     3629
>>>                     MKL_PARDISO iparm[64]:     0
>>>                     MKL_PARDISO maxfct:     1
>>>                     MKL_PARDISO mnum:     1
>>>                     MKL_PARDISO mtype:     11
>>>                     MKL_PARDISO n:     6000
>>>                     MKL_PARDISO nrhs:     1
>>>                     MKL_PARDISO msglvl:     0
>>>           linear system matrix = precond matrix:
>>>           Mat Object: 1 MPI processes
>>>             type: seqaij
>>>             rows=6000, cols=6000
>>>             total: nonzeros=150000, allocated nonzeros=480000
>>>             total number of mallocs used during MatSetValues calls =0
>>>               not using I-node routines
>>>       linear system matrix = precond matrix:
>>>       Mat Object: 1 MPI processes
>>>         type: seqaij
>>>         rows=6000, cols=6000
>>>         total: nonzeros=150000, allocated nonzeros=480000
>> Note that you preallocated more than necessary here.
>>
>>>         total number of mallocs used during MatSetValues calls =0
>>>           not using I-node routines
>>> ************************************************************************************************************************
>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>>> -fCourier9' to print this document            ***
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe
>>> on a  named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 22:12:30 2018
>>> Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date:
>>> 2018-08-28 19:19:26 -0500
>>>
>>>                            Max       Max/Min     Avg       Total
>>> Time (sec):           3.373e+01     1.000   3.373e+01
>>> Objects:              3.300e+01     1.000   3.300e+01
>>> Flop:                 7.500e+05     1.000   7.500e+05  7.500e+05
>>> Flop/sec:             2.224e+04     1.000   2.224e+04  2.224e+04
>>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00     0.000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>>                               e.g., VecAXPY() for real vectors of length
>>> N --> 2N flop
>>>                               and VecAXPY() for complex vectors of length
>>> N --> 8N flop
>>>
>>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>>> ---  -- Message Lengths --  -- Reductions --
>>>                           Avg     %Total     Avg     %Total Count
>>> %Total     Avg         %Total    Count   %Total
>>>    0:      Main Stage: 3.3730e+01 100.0%  7.5000e+05 100.0% 0.000e+00
>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>>      Count: number of times phase was executed
>>>      Time and Flop: Max - maximum over all processors
>>>                     Ratio - ratio of maximum to minimum over all processors
>>>      Mess: number of messages sent
>>>      AvgLen: average message length (bytes)
>>>      Reduct: number of global reductions
>>>      Global: entire computation
>>>      Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().
>>>         %T - percent time in this phase         %F - percent flop in this
>>> phase
>>>         %M - percent messages in this phase     %L - percent message
>>> lengths in this phase
>>>         %R - percent reductions in this phase
>>>      Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>>> over all processors)
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                Count      Time (sec)
>>> Flop                              --- Global ---  --- Stage ---- Total
>>>                      Max Ratio  Max     Ratio   Max  Ratio  Mess AvgLen
>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> BuildTwoSidedF         2 1.0 7.9003e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> SNESSolve              1 1.0 7.3396e+00 1.0 7.50e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 22100  0  0  0  22100  0  0  0     0
>>> SNESFunctionEval       2 1.0 3.8894e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> SNESJacobianEval       1 1.0 5.1611e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 15  0  0  0  0  15  0  0  0  0     0
>> Most of the time here is spent in your code computing the Jacobian, not
>> in the solver.
>>
>>> SNESLineSearch         1 1.0 1.7655e-02 1.0 3.60e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0 48  0  0  0   0 48  0  0  0    20
>>> VecDot                 1 1.0 1.1546e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0  1039
>>> VecMDot                1 1.0 2.3093e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   520
>>> VecNorm                5 1.0 4.9769e-01 1.0 6.00e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  8  0  0  0   1  8  0  0  0     0
>>> VecScale               2 1.0 9.3291e-03 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0     1
>>> VecCopy                3 1.0 2.3093e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet                 4 1.0 1.5395e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAXPY                1 1.0 2.5734e-02 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0     0
>>> VecWAXPY               1 1.0 1.3257e-05 1.0 6.00e+03 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   453
>>> VecMAXPY               2 1.0 2.3521e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  1020
>>> VecAssemblyBegin       2 1.0 7.9040e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 1.7961e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecReduceArith         2 1.0 9.8359e-06 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  2440
>>> VecReduceComm          1 1.0 4.7041e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecNormalize           2 1.0 9.3552e-03 1.0 3.60e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  5  0  0  0   0  5  0  0  0     4
>>> MatMult                2 1.0 4.4860e-04 1.0 5.88e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0 78  0  0  0   0 78  0  0  0  1311
>>> MatSolve               2 1.0 1.7876e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatLUFactorSym         1 1.0 7.6157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatLUFactorNum         1 1.0 6.5977e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> MatAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         2 1.0 1.2962e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetRowIJ            1 1.0 1.1534e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetOrdering         1 1.0 5.1809e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatZeroEntries         1 1.0 1.1298e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatView                3 1.0 3.8344e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> KSPSetUp               2 1.0 3.2843e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               1 1.0 1.6414e+00 1.0 3.78e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  5 50  0  0  0   5 50  0  0  0     0
>>> KSPGMRESOrthog         1 1.0 5.9015e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  3  0  0  0   0  3  0  0  0   407
>>> PCSetUp                2 1.0 1.4267e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>> PCSetUpOnBlocks        1 1.0 1.4266e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>> PCApply                2 1.0 1.7882e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type          Creations   Destructions     Memory Descendants' Mem.
>>> Reports information only for process 0.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>>                   SNES     1              0            0     0.
>>>                 DMSNES     1              0            0     0.
>>>         SNESLineSearch     1              0            0     0.
>>>                 Vector    12              0            0     0.
>>>                 Matrix     2              0            0     0.
>>>       Distributed Mesh     2              0            0     0.
>>>              Index Set     2              0            0     0.
>>>      Star Forest Graph     4              0            0     0.
>>>        Discrete System     2              0            0     0.
>>>          Krylov Solver     2              0            0     0.
>>>        DMKSP interface     1              0            0     0.
>>>         Preconditioner     2              0            0     0.
>>>                 Viewer     1              0            0     0.
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 4.27647e-08
>>> #PETSc Option Table entries:
>>> -ksp_atol 1e-6
>>> -ksp_rtol 1e-5
>>> -snes_rtol 1e-4
>>> -sub_ksp_type preonly
>>> -sub_pc_factor_mat_solver_type mkl_pardiso
>>> -sub_pc_type lu
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --prefix=/home/alireza/PetscGit
>>> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
>>> --with-hypre-incl
>>> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
>>> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
>>> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
>>> ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
>>> --with-mpi-include=/cygdrive/E/Program_Fi
>>> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
>>> el64/lib/impi.lib
>>> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
>>> --with-debugging=0 --with-blas
>>> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
>>> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
>>> -FFLAGS="-MT -O2 -Qopenmp"
>>> -----------------------------------------
>>> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
>>> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
>>> Using PETSc directory: /home/alireza/PetscGit
>>> Using PETSc arch:
>>> -----------------------------------------
>>>
>>> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>> -O2 -MT -wd4996 -Qopenmp
>>> Using Fortran compiler:
>>> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
>>> -fpp
>>> -----------------------------------------
>>>
>>> Using include paths: -I/home/alireza/PetscGit/include
>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
>>> -I/cygdrive/E/hypre-2.11.2/
>>> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
>>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>>> -----------------------------------------
>>>
>>> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>>> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
>>> ifort
>>> Using libraries: -L/home/alireza/PetscGit/lib
>>> -L/home/alireza/PetscGit/lib -lpetsc
>>> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>>> /cygdrive/E/Trilinos-master/Bins/lib
>>> /ml.lib
>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
>>> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
>>> Gdi32.lib User32.lib
>>>    Advapi32.lib Kernel32.lib Ws2_32.lib
>>> -----------------------------------------
>>>
>>>
>>>
>>> On 8/29/2018 7:40 PM, Smith, Barry F. wrote:
>>>>      You need to add the line
>>>>
>>>>     ierr = MatSetVariableBlockSizes(J,3,lens);CHKERRQ(ierr);
>>>>
>>>> with the number of blocks replacing 3 and the sizes of each block in lens
>>>>
>>>>      Barry
>>>>
>>>>
>>>>> On Aug 29, 2018, at 4:27 AM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>
>>>>> Barry,
>>>>>
>>>>> Thanks a lot for your efforts. Using PCVPBJACOBI causes SNESSolve to throw the following error (the code and the system being simulated is the same, just PCBJACOBI changed to PCVPBJACOBI):
>>>>>
>>>>>
>>>>> [0]PETSC ERROR: --------------------- Error Message ---------------------------------------
>>>>> -----------------------
>>>>> [0]PETSC ERROR: Nonconforming object sizes
>>>>> [0]PETSC ERROR: Total blocksizes 0 doesn't match number matrix rows 108000
>>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5  GIT Date: 2018-08-28 19:19:26 -0500
>>>>> [0]PETSC ERROR: E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe on a  named ALIREZA-PC by AliReza Wed Aug 29 13:46:38 2018
>>>>> [0]PETSC ERROR: Configure options --prefix=/home/alireza/PetscGit --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl --
>>>>> with-hypre-include=/cygdrive/E/hypre-2.11.2/Builds/Bins/include --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib --with-ml-include=/cygdrive/E/Trilinos
>>>>> -master/Bins/include --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib ظ€ôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" --with-mpi-include=/cygdri
>>>>> ve/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/
>>>>> windows/mpi/intel64/lib/impi.lib --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe --with-debuggin
>>>>> g=0 --with-blas-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib --with-lapack-lib=/cygdrive/E/Program_Files_
>>>>> x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" -FFLAGS="-MT -O2 -
>>>>> Qopenmp"
>>>>> [0]PETSC ERROR: #1 MatInvertVariableBlockDiagonal_SeqAIJ() line 1569 in E:\cygwin64\home\alireza\PETSc\src\mat\impls\aij\seq\aij.c
>>>>> [0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal() line 10482 in E:\cygwin64\home\alireza\PETSc\src\mat\interface\matrix.c
>>>>> [0]PETSC ERROR: #3 PCSetUp_VPBJacobi() line 120 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\impls\vpbjacobi\vpbjacobi.c
>>>>> [0]PETSC ERROR: #4 PCSetUp() line 932 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\interface\precon.c
>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 381 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>>> [0]PETSC ERROR: #6 KSPSolve() line 612 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>>> [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 224 in E:\cygwin64\home\alireza\PETSc\src\snes\impls\ls\ls.c
>>>>> [0]PETSC ERROR: #8 SNESSolve() line 4355 in E:\cygwin64\home\alireza\PETSc\src\snes\interface\snes.c
>>>>>
>>>>> Many thanks,
>>>>> Ali
>>>>>
>>>>> On 8/29/2018 2:24 AM, Smith, Barry F. wrote:
>>>>>>     Ali,
>>>>>>
>>>>>>       In the branch barry/feature-PCVPBJACOBI (see src/snes/examples/tutorials/ex5.c) I have implemented PCVPBJACOBI that is a point block Jacobi with variable size blocks. It has very little testing.
>>>>>>
>>>>>>       Could you please try your code with it; you should see very very similar convergence history with this branch and the previous approach you used (the only numerical difference between the two approaches is that this one uses a dense LU factorization for each small block while the previous used a sparse factorization). This new one should also be slightly faster in the KSPSolve().
>>>>>>
>>>>>>      Thanks
>>>>>>
>>>>>>       Barry
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Aug 27, 2018, at 4:04 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>>>>
>>>>>>> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
>>>>>>>
>>>>>>>>      I have added new functionality within SNES to allow one to do as you desire in the branch barry/feature-snessetkspsetupcallback. Please see src/snes/examples/tests/ex3.c
>>>>>>> Note that this example is block Jacobi with O(1) sparse blocks per
>>>>>>> process, not variable-sized point-block Jacobi which I think is what Ali
>>>>>>> had in mind.
>>>>>>>
>>>>>>>>      Please let us know if it works for you and if you have any questions or problems.
>>>>>>>>
>>>>>>>>      Barry
>>>>>>>>
>>>>>>>>     Note that it has to be handled by a callback called from within the SNES solver because that is the first time the matrix exists in a form that the block sizes may be set.
>>>>>>>>
>>>>>>>>      Sorry for the runaround with so many emails.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 24, 2018, at 3:48 PM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am trying to use block Jacobi preconditioner in SNES (SNESNEWTONLS). However, PCBJacobiGetSubKSP function returns an error stating "Object is in wrong state, Must call KSPSetUp() or PCSetUp() first". When I add KSPSetUp, I got and error from them as: "Must call DMShellSetGlobalVector() or DMShellSetCreateGlobalVector()", and if PCSetUp is added, "Object is in wrong state, Matrix must be set first" error is printed.
>>>>>>>>>
>>>>>>>>> Below is a part of my code. It is run serially. Any help is much appreciated.
>>>>>>>>>
>>>>>>>>>      ierr = SNESGetKSP(snes, &Petsc_ksp);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = KSPGetPC(Petsc_ksp, &Petsc_pc);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = KSPSetTolerances(Petsc_ksp, 1.e-3, 1e-3, PETSC_DEFAULT, 20000);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = SNESSetTolerances(snes, 1e-1, 1e-1, 1e-1, 2000, 2000);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = SNESSetType(snes, SNESNEWTONLS);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = KSPSetType(Petsc_ksp, KSPGMRES);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = PCBJacobiSetTotalBlocks(Petsc_pc, 2*Nx*Ny*Nz, SadeqSize);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>
>>>>>>>>>      SNESSetUp(snes);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>      ierr = PCBJacobiGetSubKSP(Petsc_pc, &nLocal, &Firstly, &subKSP);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>
>>>>>>>>>      for (i = 0; i < nLocal; i++) {
>>>>>>>>>          ierr = KSPGetPC(subKSP[i], &SubPc);
>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>          ierr = PCSetType(SubPc, PCLU);
>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>          ierr = PCFactorSetMatSolverPackage(SubPc, "mkl_pardiso");
>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>          ierr = KSPSetType(subKSP[i], KSPPREONLY);
>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>          ierr = KSPSetTolerances(subKSP[i], 1.e-6, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
>>>>>>>>>          CHKERRQ(ierr);
>>>>>>>>>      }
>>>>>>>>>      ierr = SNESSolve(snes, NULL, Petsc_X);
>>>>>>>>>      CHKERRQ(ierr);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>
> -- 
> Ali Reza Khaz’ali
> Assistant Professor of Petroleum Engineering,
> Department of Chemical Engineering
> Isfahan University of Technology
> Isfahan, Iran


More information about the petsc-users mailing list