[petsc-users] Use block Jacobi preconditioner with SNES
Ali Reza Khaz'ali
arkhazali at cc.iut.ac.ir
Wed Aug 29 13:18:04 CDT 2018
Oh I'm sorry. There have been a lot of changes in my code, and I lost
track of them. I've fixed the code, and the log for 30x30x10 system is
as follows. I measure the memory usage with windows task manager. Both
of your comment are true. This is just a test run, and I'll optimize it
ASAP.
Thanks again.
Best wishes,
Ali
1 MPI processes
type: newtonls
maximum iterations=2000, maximum function evaluations=2000
tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
total number of linear solver iterations=3
total number of function evaluations=2
norm schedule ALWAYS
SNESLineSearch Object: 1 MPI processes
type: bt
interpolation: cubic
alpha=1.000000e-04
maxstep=1.000000e+08, minlambda=1.000000e-12
tolerances: relative=1.000000e-08, absolute=1.000000e-15,
lambda=1.000000e-08
maximum iterations=40
KSP Object: 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=5000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-06, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: vpbjacobi
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=108000, cols=108000
total: nonzeros=2868000, allocated nonzeros=8640000
total number of mallocs used during MatSetValues calls =0
not using I-node routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe
on a named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 22:43:35 2018
Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5 GIT Date:
2018-08-28 19:19:26 -0500
Max Max/Min Avg Total
Time (sec): 1.187e+02 1.000 1.187e+02
Objects: 3.600e+01 1.000 3.600e+01
Flop: 2.867e+07 1.000 2.867e+07 2.867e+07
Flop/sec: 2.414e+05 1.000 2.414e+05 2.414e+05
MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length
N --> 2N flop
and VecAXPY() for complex vectors of length
N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count
%Total Avg %Total Count %Total
0: Main Stage: 1.1875e+02 100.0% 2.8668e+07 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this
phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSidedF 2 1.0 2.7369e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 1 1.0 9.8642e+01 1.0 2.87e+07 1.0 0.0e+00 0.0e+00
0.0e+00 83100 0 0 0 83100 0 0 0 0
SNESFunctionEval 2 1.0 5.4137e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 5 0 0 0 0 5 0 0 0 0 0
SNESJacobianEval 1 1.0 9.3167e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 78 0 0 0 0 78 0 0 0 0 0
SNESLineSearch 1 1.0 2.6980e+00 1.0 6.82e+06 1.0 0.0e+00 0.0e+00
0.0e+00 2 24 0 0 0 2 24 0 0 0 3
VecDot 1 1.0 1.7705e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1220
VecMDot 3 1.0 7.4411e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 5 0 0 0 0 5 0 0 0 1742
VecNorm 7 1.0 5.9443e-03 1.0 1.51e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 5 0 0 0 0 5 0 0 0 254
VecScale 4 1.0 1.4882e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 2 0 0 0 0 2 0 0 0 2903
VecCopy 3 1.0 4.6827e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 2 1.0 1.3899e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1 1.0 1.5652e-04 1.0 2.16e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 1380
VecWAXPY 1 1.0 1.7448e-04 1.0 1.08e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 619
VecMAXPY 4 1.0 7.7233e-04 1.0 1.94e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 2517
VecAssemblyBegin 2 1.0 4.6614e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 7.2700e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 2 1.0 2.3948e-04 1.0 4.32e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 2 0 0 0 0 2 0 0 0 1804
VecReduceComm 1 1.0 2.1382e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 4 1.0 4.1995e-04 1.0 1.30e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 5 0 0 0 0 5 0 0 0 3086
MatMult 4 1.0 1.6703e-02 1.0 2.25e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 79 0 0 0 0 79 0 0 0 1348
MatAssemblyBegin 2 1.0 1.2829e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.6648e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 5.4123e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 6.0384e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 4.3723e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 4.9647e-02 1.0 2.16e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 75 0 0 0 0 75 0 0 0 436
KSPGMRESOrthog 3 1.0 1.2919e-03 1.0 2.59e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 9 0 0 0 0 9 0 0 0 2006
PCSetUp 1 1.0 1.8846e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 4 1.0 2.9807e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
SNES 1 0 0 0.
DMSNES 1 0 0 0.
SNESLineSearch 1 0 0 0.
Vector 20 0 0 0.
Matrix 1 0 0 0.
Distributed Mesh 2 0 0 0.
Star Forest Graph 4 0 0 0.
Discrete System 2 0 0 0.
Krylov Solver 1 0 0 0.
DMKSP interface 1 0 0 0.
Preconditioner 1 0 0 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 8.55292e-08
#PETSc Option Table entries:
-ksp_atol 1e-6
-ksp_rtol 1e-5
-snes_rtol 1e-4
-sub_ksp_type preonly
-sub_pc_factor_mat_solver_type mkl_pardiso
-sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/home/alireza/PetscGit
--with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
--with-hypre-incl
ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
--with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
--with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
ظôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
--with-mpi-include=/cygdrive/E/Program_Fi
les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
--with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
el64/lib/impi.lib
--with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
--with-debugging=0 --with-blas
-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
--with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
-CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
-FFLAGS="-MT -O2 -Qopenmp"
-----------------------------------------
Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
Using PETSc directory: /home/alireza/PetscGit
Using PETSc arch:
-----------------------------------------
Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
-O2 -MT -wd4996 -Qopenmp
Using Fortran compiler:
/home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
-fpp
-----------------------------------------
Using include paths: -I/home/alireza/PetscGit/include
-I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
-I/cygdrive/E/hypre-2.11.2/
Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
-I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
-----------------------------------------
Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
ifort
Using libraries: -L/home/alireza/PetscGit/lib
-L/home/alireza/PetscGit/lib -lpetsc
/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
/cygdrive/E/Trilinos-master/Bins/lib
/ml.lib
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
Gdi32.lib User32.lib
Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------
On 8/29/2018 10:29 PM, Jed Brown wrote:
> This is bjacobi/lu, not vpbjacobi.
>
> How are you measuring memory usage? See a couple inline comments below.
>
> Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> writes:
>
>> I noticed that I forgot to include the log for 10x10x5 system. Here it is:
>>
>>
>> 1 MPI processes
>> type: newtonls
>> maximum iterations=2000, maximum function evaluations=2000
>> tolerances: relative=0.0001, absolute=1e-05, solution=1e-05
>> total number of linear solver iterations=1
>> total number of function evaluations=2
>> norm schedule ALWAYS
>> SNESLineSearch Object: 1 MPI processes
>> type: bt
>> interpolation: cubic
>> alpha=1.000000e-04
>> maxstep=1.000000e+08, minlambda=1.000000e-12
>> tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> lambda=1.000000e-08
>> maximum iterations=40
>> KSP Object: 1 MPI processes
>> type: gmres
>> restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> happy breakdown tolerance 1e-30
>> maximum iterations=5000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-06, divergence=10000.
>> left preconditioning
>> using PRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>> type: bjacobi
>> number of blocks = 1
>> Local solve is same for all blocks, in the following KSP and PC
>> objects:
>> KSP Object: (sub_) 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (sub_) 1 MPI processes
>> type: lu
>> out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> matrix ordering: nd
>> factor fill ratio given 0., needed 0.
>> Factored matrix follows:
>> Mat Object: 1 MPI processes
>> type: mkl_pardiso
>> rows=6000, cols=6000
>> package used to perform factorization: mkl_pardiso
>> total: nonzeros=150000, allocated nonzeros=150000
>> total number of mallocs used during MatSetValues calls =0
>> MKL_PARDISO run parameters:
>> MKL_PARDISO phase: 33
>> MKL_PARDISO iparm[1]: 1
>> MKL_PARDISO iparm[2]: 2
>> MKL_PARDISO iparm[3]: 2
>> MKL_PARDISO iparm[4]: 0
>> MKL_PARDISO iparm[5]: 0
>> MKL_PARDISO iparm[6]: 0
>> MKL_PARDISO iparm[7]: 0
>> MKL_PARDISO iparm[8]: 0
>> MKL_PARDISO iparm[9]: 0
>> MKL_PARDISO iparm[10]: 13
>> MKL_PARDISO iparm[11]: 1
>> MKL_PARDISO iparm[12]: 0
>> MKL_PARDISO iparm[13]: 1
>> MKL_PARDISO iparm[14]: 0
>> MKL_PARDISO iparm[15]: 11228
>> MKL_PARDISO iparm[16]: 7563
>> MKL_PARDISO iparm[17]: 37035
>> MKL_PARDISO iparm[18]: 4204238
>> MKL_PARDISO iparm[19]: 2659
>> MKL_PARDISO iparm[20]: 0
>> MKL_PARDISO iparm[21]: 0
>> MKL_PARDISO iparm[22]: 0
>> MKL_PARDISO iparm[23]: 0
>> MKL_PARDISO iparm[24]: 0
>> MKL_PARDISO iparm[25]: 0
>> MKL_PARDISO iparm[26]: 0
>> MKL_PARDISO iparm[27]: 0
>> MKL_PARDISO iparm[28]: 0
>> MKL_PARDISO iparm[29]: 0
>> MKL_PARDISO iparm[30]: 0
>> MKL_PARDISO iparm[31]: 0
>> MKL_PARDISO iparm[32]: 0
>> MKL_PARDISO iparm[33]: 0
>> MKL_PARDISO iparm[34]: -1
>> MKL_PARDISO iparm[35]: 1
>> MKL_PARDISO iparm[36]: 0
>> MKL_PARDISO iparm[37]: 0
>> MKL_PARDISO iparm[38]: 0
>> MKL_PARDISO iparm[39]: 0
>> MKL_PARDISO iparm[40]: 0
>> MKL_PARDISO iparm[41]: 0
>> MKL_PARDISO iparm[42]: 0
>> MKL_PARDISO iparm[43]: 0
>> MKL_PARDISO iparm[44]: 0
>> MKL_PARDISO iparm[45]: 0
>> MKL_PARDISO iparm[46]: 0
>> MKL_PARDISO iparm[47]: 0
>> MKL_PARDISO iparm[48]: 0
>> MKL_PARDISO iparm[49]: 0
>> MKL_PARDISO iparm[50]: 0
>> MKL_PARDISO iparm[51]: 0
>> MKL_PARDISO iparm[52]: 0
>> MKL_PARDISO iparm[53]: 0
>> MKL_PARDISO iparm[54]: 0
>> MKL_PARDISO iparm[55]: 0
>> MKL_PARDISO iparm[56]: 0
>> MKL_PARDISO iparm[57]: -1
>> MKL_PARDISO iparm[58]: 0
>> MKL_PARDISO iparm[59]: 0
>> MKL_PARDISO iparm[60]: 0
>> MKL_PARDISO iparm[61]: 11228
>> MKL_PARDISO iparm[62]: 7868
>> MKL_PARDISO iparm[63]: 3629
>> MKL_PARDISO iparm[64]: 0
>> MKL_PARDISO maxfct: 1
>> MKL_PARDISO mnum: 1
>> MKL_PARDISO mtype: 11
>> MKL_PARDISO n: 6000
>> MKL_PARDISO nrhs: 1
>> MKL_PARDISO msglvl: 0
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=6000, cols=6000
>> total: nonzeros=150000, allocated nonzeros=480000
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=6000, cols=6000
>> total: nonzeros=150000, allocated nonzeros=480000
> Note that you preallocated more than necessary here.
>
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node routines
>> ************************************************************************************************************************
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
>> -fCourier9' to print this document ***
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>>
>> E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe
>> on a named ALIREZA-PC with 1 processor, by AliReza Wed Aug 29 22:12:30 2018
>> Using Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5 GIT Date:
>> 2018-08-28 19:19:26 -0500
>>
>> Max Max/Min Avg Total
>> Time (sec): 3.373e+01 1.000 3.373e+01
>> Objects: 3.300e+01 1.000 3.300e+01
>> Flop: 7.500e+05 1.000 7.500e+05 7.500e+05
>> Flop/sec: 2.224e+04 1.000 2.224e+04 2.224e+04
>> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00
>> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00
>> MPI Reductions: 0.000e+00 0.000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of length
>> N --> 2N flop
>> and VecAXPY() for complex vectors of length
>> N --> 8N flop
>>
>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages
>> --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total Count
>> %Total Avg %Total Count %Total
>> 0: Main Stage: 3.3730e+01 100.0% 7.5000e+05 100.0% 0.000e+00
>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flop: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all processors
>> Mess: number of messages sent
>> AvgLen: average message length (bytes)
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flop in this
>> phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>> over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>> Event Count Time (sec)
>> Flop --- Global --- --- Stage ---- Total
>> Max Ratio Max Ratio Max Ratio Mess AvgLen
>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> BuildTwoSidedF 2 1.0 7.9003e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> SNESSolve 1 1.0 7.3396e+00 1.0 7.50e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 22100 0 0 0 22100 0 0 0 0
>> SNESFunctionEval 2 1.0 3.8894e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> SNESJacobianEval 1 1.0 5.1611e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 15 0 0 0 0 15 0 0 0 0 0
> Most of the time here is spent in your code computing the Jacobian, not
> in the solver.
>
>> SNESLineSearch 1 1.0 1.7655e-02 1.0 3.60e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 48 0 0 0 0 48 0 0 0 20
>> VecDot 1 1.0 1.1546e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 1039
>> VecMDot 1 1.0 2.3093e-05 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 520
>> VecNorm 5 1.0 4.9769e-01 1.0 6.00e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 8 0 0 0 1 8 0 0 0 0
>> VecScale 2 1.0 9.3291e-03 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 1
>> VecCopy 3 1.0 2.3093e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecSet 4 1.0 1.5395e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 1 1.0 2.5734e-02 1.0 1.20e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 0
>> VecWAXPY 1 1.0 1.3257e-05 1.0 6.00e+03 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 453
>> VecMAXPY 2 1.0 2.3521e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 3 0 0 0 0 3 0 0 0 1020
>> VecAssemblyBegin 2 1.0 7.9040e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAssemblyEnd 2 1.0 1.7961e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecReduceArith 2 1.0 9.8359e-06 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 3 0 0 0 0 3 0 0 0 2440
>> VecReduceComm 1 1.0 4.7041e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecNormalize 2 1.0 9.3552e-03 1.0 3.60e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 5 0 0 0 0 5 0 0 0 4
>> MatMult 2 1.0 4.4860e-04 1.0 5.88e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 78 0 0 0 0 78 0 0 0 1311
>> MatSolve 2 1.0 1.7876e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> MatLUFactorSym 1 1.0 7.6157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
>> MatLUFactorNum 1 1.0 6.5977e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
>> MatAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatAssemblyEnd 2 1.0 1.2962e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetRowIJ 1 1.0 1.1534e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatGetOrdering 1 1.0 5.1809e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatZeroEntries 1 1.0 1.1298e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> MatView 3 1.0 3.8344e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> KSPSetUp 2 1.0 3.2843e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 1 1.0 1.6414e+00 1.0 3.78e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 5 50 0 0 0 5 50 0 0 0 0
>> KSPGMRESOrthog 1 1.0 5.9015e-05 1.0 2.40e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 3 0 0 0 0 3 0 0 0 407
>> PCSetUp 2 1.0 1.4267e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
>> PCSetUpOnBlocks 1 1.0 1.4266e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
>> PCApply 2 1.0 1.7882e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type Creations Destructions Memory Descendants' Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>> SNES 1 0 0 0.
>> DMSNES 1 0 0 0.
>> SNESLineSearch 1 0 0 0.
>> Vector 12 0 0 0.
>> Matrix 2 0 0 0.
>> Distributed Mesh 2 0 0 0.
>> Index Set 2 0 0 0.
>> Star Forest Graph 4 0 0 0.
>> Discrete System 2 0 0 0.
>> Krylov Solver 2 0 0 0.
>> DMKSP interface 1 0 0 0.
>> Preconditioner 2 0 0 0.
>> Viewer 1 0 0 0.
>> ========================================================================================================================
>> Average time to get PetscTime(): 4.27647e-08
>> #PETSc Option Table entries:
>> -ksp_atol 1e-6
>> -ksp_rtol 1e-5
>> -snes_rtol 1e-4
>> -sub_ksp_type preonly
>> -sub_pc_factor_mat_solver_type mkl_pardiso
>> -sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --prefix=/home/alireza/PetscGit
>> --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl
>> --with-hypre-incl
>> ude=/cygdrive/E/hypre-2.11.2/Builds/Bins/include
>> --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>> --with-ml-include=/cygdrive/E/Trilinos-master/Bins/in
>> clude --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib
>> ظôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort"
>> --with-mpi-include=/cygdrive/E/Program_Fi
>> les_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>> --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/int
>> el64/lib/impi.lib
>> --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe
>> --with-debugging=0 --with-blas
>> -lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> --with-lapack-lib=/cygdrive/E/Program_Files_x86/IntelSWTool
>> s/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp"
>> -FFLAGS="-MT -O2 -Qopenmp"
>> -----------------------------------------
>> Libraries compiled on 2018-08-29 08:35:21 on AliReza-PC
>> Machine characteristics: CYGWIN_NT-6.1-2.10.0-0.325-5-3-x86_64-64bit
>> Using PETSc directory: /home/alireza/PetscGit
>> Using PETSc arch:
>> -----------------------------------------
>>
>> Using C compiler: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>> -O2 -MT -wd4996 -Qopenmp
>> Using Fortran compiler:
>> /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe ifort -MT -O2 -Qopenmp
>> -fpp
>> -----------------------------------------
>>
>> Using include paths: -I/home/alireza/PetscGit/include
>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/include
>> -I/cygdrive/E/hypre-2.11.2/
>> Builds/Bins/include -I/cygdrive/E/Trilinos-master/Bins/include
>> -I/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include
>> -----------------------------------------
>>
>> Using C linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe icl
>> Using Fortran linker: /home/alireza/PETSc/lib/petsc/bin/win32fe/win32fe
>> ifort
>> Using libraries: -L/home/alireza/PetscGit/lib
>> -L/home/alireza/PetscGit/lib -lpetsc
>> /cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib
>> /cygdrive/E/Trilinos-master/Bins/lib
>> /ml.lib
>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and
>> _libraries/windows/mkl/lib/intel64_win/mkl_rt.lib
>> /cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/lib/impi.lib
>> Gdi32.lib User32.lib
>> Advapi32.lib Kernel32.lib Ws2_32.lib
>> -----------------------------------------
>>
>>
>>
>> On 8/29/2018 7:40 PM, Smith, Barry F. wrote:
>>> You need to add the line
>>>
>>> ierr = MatSetVariableBlockSizes(J,3,lens);CHKERRQ(ierr);
>>>
>>> with the number of blocks replacing 3 and the sizes of each block in lens
>>>
>>> Barry
>>>
>>>
>>>> On Aug 29, 2018, at 4:27 AM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>
>>>> Barry,
>>>>
>>>> Thanks a lot for your efforts. Using PCVPBJACOBI causes SNESSolve to throw the following error (the code and the system being simulated is the same, just PCBJACOBI changed to PCVPBJACOBI):
>>>>
>>>>
>>>> [0]PETSC ERROR: --------------------- Error Message ---------------------------------------
>>>> -----------------------
>>>> [0]PETSC ERROR: Nonconforming object sizes
>>>> [0]PETSC ERROR: Total blocksizes 0 doesn't match number matrix rows 108000
>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.9.3-1238-g434eb0c8b5 GIT Date: 2018-08-28 19:19:26 -0500
>>>> [0]PETSC ERROR: E:\Documents\Visual Studio 2015\Projects\compsim\x64\Release\compsim.exe on a named ALIREZA-PC by AliReza Wed Aug 29 13:46:38 2018
>>>> [0]PETSC ERROR: Configure options --prefix=/home/alireza/PetscGit --with-mkl_pardiso-dir=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl --
>>>> with-hypre-include=/cygdrive/E/hypre-2.11.2/Builds/Bins/include --with-hypre-lib=/cygdrive/E/hypre-2.11.2/Builds/Bins/lib/HYPRE.lib --with-ml-include=/cygdrive/E/Trilinos
>>>> -master/Bins/include --with-ml-lib=/cygdrive/E/Trilinos-master/Bins/lib/ml.lib ظôwith-openmp --with-cc="win32fe icl" --with-fc="win32fe ifort" --with-mpi-include=/cygdri
>>>> ve/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/include --with-mpi-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/
>>>> windows/mpi/intel64/lib/impi.lib --with-mpi-mpiexec=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mpi/intel64/bin/mpiexec.exe --with-debuggin
>>>> g=0 --with-blas-lib=/cygdrive/E/Program_Files_x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib --with-lapack-lib=/cygdrive/E/Program_Files_
>>>> x86/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64_win/mkl_rt.lib -CFLAGS="-O2 -MT -wd4996 -Qopenmp" -CXXFLAGS="-O2 -MT -wd4996 -Qopenmp" -FFLAGS="-MT -O2 -
>>>> Qopenmp"
>>>> [0]PETSC ERROR: #1 MatInvertVariableBlockDiagonal_SeqAIJ() line 1569 in E:\cygwin64\home\alireza\PETSc\src\mat\impls\aij\seq\aij.c
>>>> [0]PETSC ERROR: #2 MatInvertVariableBlockDiagonal() line 10482 in E:\cygwin64\home\alireza\PETSc\src\mat\interface\matrix.c
>>>> [0]PETSC ERROR: #3 PCSetUp_VPBJacobi() line 120 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\impls\vpbjacobi\vpbjacobi.c
>>>> [0]PETSC ERROR: #4 PCSetUp() line 932 in E:\cygwin64\home\alireza\PETSc\src\ksp\pc\interface\precon.c
>>>> [0]PETSC ERROR: #5 KSPSetUp() line 381 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>> [0]PETSC ERROR: #6 KSPSolve() line 612 in E:\cygwin64\home\alireza\PETSc\src\ksp\ksp\interface\itfunc.c
>>>> [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 224 in E:\cygwin64\home\alireza\PETSc\src\snes\impls\ls\ls.c
>>>> [0]PETSC ERROR: #8 SNESSolve() line 4355 in E:\cygwin64\home\alireza\PETSc\src\snes\interface\snes.c
>>>>
>>>> Many thanks,
>>>> Ali
>>>>
>>>> On 8/29/2018 2:24 AM, Smith, Barry F. wrote:
>>>>> Ali,
>>>>>
>>>>> In the branch barry/feature-PCVPBJACOBI (see src/snes/examples/tutorials/ex5.c) I have implemented PCVPBJACOBI that is a point block Jacobi with variable size blocks. It has very little testing.
>>>>>
>>>>> Could you please try your code with it; you should see very very similar convergence history with this branch and the previous approach you used (the only numerical difference between the two approaches is that this one uses a dense LU factorization for each small block while the previous used a sparse factorization). This new one should also be slightly faster in the KSPSolve().
>>>>>
>>>>> Thanks
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 27, 2018, at 4:04 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>>>
>>>>>> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
>>>>>>
>>>>>>> I have added new functionality within SNES to allow one to do as you desire in the branch barry/feature-snessetkspsetupcallback. Please see src/snes/examples/tests/ex3.c
>>>>>> Note that this example is block Jacobi with O(1) sparse blocks per
>>>>>> process, not variable-sized point-block Jacobi which I think is what Ali
>>>>>> had in mind.
>>>>>>
>>>>>>> Please let us know if it works for you and if you have any questions or problems.
>>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>> Note that it has to be handled by a callback called from within the SNES solver because that is the first time the matrix exists in a form that the block sizes may be set.
>>>>>>>
>>>>>>> Sorry for the runaround with so many emails.
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 24, 2018, at 3:48 PM, Ali Reza Khaz'ali <arkhazali at cc.iut.ac.ir> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to use block Jacobi preconditioner in SNES (SNESNEWTONLS). However, PCBJacobiGetSubKSP function returns an error stating "Object is in wrong state, Must call KSPSetUp() or PCSetUp() first". When I add KSPSetUp, I got and error from them as: "Must call DMShellSetGlobalVector() or DMShellSetCreateGlobalVector()", and if PCSetUp is added, "Object is in wrong state, Matrix must be set first" error is printed.
>>>>>>>>
>>>>>>>> Below is a part of my code. It is run serially. Any help is much appreciated.
>>>>>>>>
>>>>>>>> ierr = SNESGetKSP(snes, &Petsc_ksp);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = KSPGetPC(Petsc_ksp, &Petsc_pc);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = KSPSetTolerances(Petsc_ksp, 1.e-3, 1e-3, PETSC_DEFAULT, 20000);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = SNESSetTolerances(snes, 1e-1, 1e-1, 1e-1, 2000, 2000);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = SNESSetType(snes, SNESNEWTONLS);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = KSPSetType(Petsc_ksp, KSPGMRES);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = PCSetType(Petsc_pc, PCBJACOBI);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = PCBJacobiSetTotalBlocks(Petsc_pc, 2*Nx*Ny*Nz, SadeqSize);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>>
>>>>>>>> SNESSetUp(snes);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = PCBJacobiGetSubKSP(Petsc_pc, &nLocal, &Firstly, &subKSP);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>>
>>>>>>>> for (i = 0; i < nLocal; i++) {
>>>>>>>> ierr = KSPGetPC(subKSP[i], &SubPc);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = PCSetType(SubPc, PCLU);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = PCFactorSetMatSolverPackage(SubPc, "mkl_pardiso");
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = KSPSetType(subKSP[i], KSPPREONLY);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> ierr = KSPSetTolerances(subKSP[i], 1.e-6, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>> }
>>>>>>>> ierr = SNESSolve(snes, NULL, Petsc_X);
>>>>>>>> CHKERRQ(ierr);
>>>>>>>>
>>>>>>>>
>>>>>>>>
--
Ali Reza Khaz’ali
Assistant Professor of Petroleum Engineering,
Department of Chemical Engineering
Isfahan University of Technology
Isfahan, Iran
More information about the petsc-users
mailing list