# [petsc-users] [petsc-maint] Iterative Solver Problem

Mon Apr 28 14:59:09 CDT 2014

options (as I mentioned before -ksp_monitor_singular_value did not
work):

-ksp_type gmres -ksp_max_it 500 -ksp_gmres_restart 500
-ksp_monitor_true_residual -pc_type asm -sub_pc_type lu
-ksp_converged_reason -ksp_view -log_summary

Linear solve did not converge due to DIVERGED_ITS iterations 500
KSP Object: 8 MPI processes
type: gmres
GMRES: restart=500, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=500, initial guess is zero
tolerances:  relative=1e-12, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: asm
Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1
Additive Schwarz: restriction/interpolation type - RESTRICT
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object:  (sub_)   1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using NONE norm type for convergence test
PC Object:  (sub_)   1 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 1e-12
matrix ordering: nd
factor fill ratio given 5, needed 3.70575
Factored matrix follows:
Matrix Object:           1 MPI processes
type: seqaij
rows=5630, cols=5630
package used to perform factorization: petsc
total: nonzeros=877150, allocated nonzeros=877150
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 1126 nodes, limit used is 5
linear system matrix = precond matrix:
Matrix Object:     1 MPI processes
type: seqaij
rows=5630, cols=5630
total: nonzeros=236700, allocated nonzeros=236700
total number of mallocs used during MatSetValues calls =0
using I-node routines: found 1126 nodes, limit used is 5
linear system matrix = precond matrix:
Matrix Object:   8 MPI processes
type: mpiaij
rows=41000, cols=41000
total: nonzeros=1817800, allocated nonzeros=2555700
total number of mallocs used during MatSetValues calls =121180
using I-node (on process 0) routines: found 1025 nodes, limit used is 5

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

/home/u14/umhassa5/mecfd/gas-code/TLEC2CCP/ctf/bin/Linux_p/ctf_Linux
on a arch-linu named mecfd02 with 8 processors, by umhassa5 Mon Apr 28
14:55:39 2014
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011

Max       Max/Min        Avg      Total
Time (sec):           7.395e+01      1.04834   7.331e+01
Objects:              1.541e+03      1.00000   1.541e+03
Flops:                5.413e+09      1.03192   5.360e+09  4.288e+10
Flops/sec:            7.655e+07      1.07860   7.314e+07  5.851e+08
MPI Messages:         3.108e+03      1.95718   2.732e+03  2.185e+04
MPI Message Lengths:  3.360e+07      2.89490   7.101e+03  1.552e+08
MPI Reductions:       1.559e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
e.g., VecAXPY() for real vectors of
length N --> 2N flops
and VecAXPY() for complex vectors of
length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  ---
Messages ---  -- Message Lengths --  -- Reductions --
Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
0:      Main Stage: 7.3305e+01 100.0%  4.2880e+10 100.0%  2.185e+04
100.0%  7.101e+03      100.0%  1.558e+03  99.9%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with
PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase         %f - percent flops in this phase
%M - percent messages in this phase     %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max
time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops
--- Global ---  --- Stage ---   Total
Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1001 1.0 2.0220e+00 1.2 4.54e+08 1.0 1.4e+04
4.0e+03 0.0e+00  3  8 64 36  0   3  8 64 36  0  1779
MatSolve             501 1.0 3.0825e+00 1.2 1.00e+09 1.1 0.0e+00
0.0e+00 0.0e+00  4 18  0  0  0   4 18  0  0  0  2510
MatLUFactorSym         1 1.0 3.1987e-02 1.3 0.00e+00 0.0 0.0e+00
0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 9.0718e-02 1.6 8.21e+07 1.5 0.0e+00
0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  5825
MatAssemblyBegin       2 1.0 1.1914e+01190.5 0.00e+00 0.0 4.2e+01
1.2e+06 2.0e+00 14  0  0 33  0  14  0  0 33  0     0
MatAssemblyEnd         2 1.0 3.0220e+01 1.0 0.00e+00 0.0 5.6e+01
1.0e+03 9.0e+00 41  0  0  0  1  41  0  0  0  1     0
MatGetRowIJ            1 1.0 2.6202e-04 1.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 2.6700e-02 1.1 0.00e+00 0.0 1.4e+02
5.5e+04 7.0e+00  0  0  1  5  0   0  0  1  5  0     0
MatGetOrdering         1 1.0 1.0290e-03 1.2 0.00e+00 0.0 0.0e+00
0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       1 1.0 3.5419e-03 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 3.0 3.3307e-04 3.6 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              500 1.0 3.7091e+00 1.2 1.28e+09 1.0 0.0e+00
0.0e+00 5.0e+02  5 24  0  0 32   5 24  0  0 32  2769
VecNorm             1003 1.0 9.0194e-01 5.3 1.03e+07 1.0 0.0e+00
0.0e+00 1.0e+03  1  0  0  0 64   1  0  0  0 64    91
VecScale             501 1.0 4.4811e-03 1.1 2.57e+06 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4584
VecCopy              502 1.0 2.6488e-02 1.2 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1505 1.0 3.8517e-02 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              501 1.0 6.1393e-03 1.1 5.14e+06 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6692
VecAYPX              501 1.0 1.7209e-02 1.2 2.57e+06 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1194
VecMAXPY            1001 1.0 6.7261e+00 1.1 2.57e+09 1.0 0.0e+00
0.0e+00 0.0e+00  9 48  0  0  0   9 48  0  0  0  3060
VecAssemblyBegin       1 1.0 1.2943e-02 1.2 0.00e+00 0.0 2.2e+02
3.1e+04 3.0e+00  0  0  1  4  0   0  0  1  4  0     0
VecAssemblyEnd         1 1.0 1.7521e-03 6.8 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2004 1.0 9.6959e-02 1.3 0.00e+00 0.0 2.1e+04
4.1e+03 0.0e+00  0  0 96 56  0   0  0 96 56  0     0
VecScatterEnd       2004 1.0 5.1422e-01 5.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         501 1.0 5.3980e-01 5.8 7.70e+06 1.0 0.0e+00
0.0e+00 5.0e+02  0  0  0  0 32   0  0  0  0 32   114
KSPGMRESOrthog       500 1.0 6.9229e+00 1.1 2.57e+09 1.0 0.0e+00
0.0e+00 5.0e+02  9 48  0  0 32   9 48  0  0 32  2967
KSPSetup               2 1.0 2.8701e-03 1.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 1.5673e+01 1.0 5.41e+09 1.0 2.1e+04
4.4e+03 1.5e+03 21100 97 60 98  21100 97 60 98  2736
PCSetUp                2 1.0 1.5435e-01 1.4 8.21e+07 1.5 2.0e+02
4.0e+04 2.6e+01  0  1  1  5  2   0  1  1  5  2  3423
PCSetUpOnBlocks        1 1.0 1.2381e-01 1.5 8.21e+07 1.5 0.0e+00
0.0e+00 7.0e+00  0  1  0  0  0   0  1  0  0  0  4268
PCApply              501 1.0 3.2335e+00 1.2 1.00e+09 1.1 7.0e+03
4.0e+03 0.0e+00  4 18 32 18  0   4 18 32 18  0  2393
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

Matrix     5              5     14741588     0
Vector  1515           1515     64610424     0
Vector Scatter     3              3         3180     0
Index Set    12             12       413568     0
Krylov Solver     2              2      4048456     0
Preconditioner     2              2         1832     0
Viewer     2              1          720     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 8.2016e-06
Average time for zero size MPI_Send(): 1.05202e-05
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_gmres_restart 500
-ksp_max_it 500
-ksp_monitor_true_residual
-ksp_type gmres
-ksp_view
-log_summary
-pc_type asm
-sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Sat Dec 31 07:53:05 2011
Configure options: --with-mpi-dir=/home/mecfd/common/openmpi-p
--PETSC_DIR=/home/mecfd/common/sw/petsc-3.2-p5-pgi --with-debugging=0
-----------------------------------------
Libraries compiled on Sat Dec 31 07:53:05 2011 on mecfd02
Machine characteristics:
Linux-2.6.18-238.19.1.el5-x86_64-with-redhat-5.7-Final
Using PETSc directory: /home/mecfd/common/sw/petsc-3.2-p5-pgi
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /home/mecfd/common/openmpi-p/bin/mpicc  -fPIC -O
\${COPTFLAGS} \${CFLAGS}
Using Fortran compiler: /home/mecfd/common/openmpi-p/bin/mpif90  -fPIC
-O   \${FOPTFLAGS} \${FFLAGS}
-----------------------------------------

Using include paths:
-I/home/mecfd/common/sw/petsc-3.2-p5-pgi/arch-linux2-c-opt/include
-I/home/mecfd/common/sw/petsc-3.2-p5-pgi/include
-I/home/mecfd/common/sw/petsc-3.2-p5-pgi/include
-I/home/mecfd/common/sw/petsc-3.2-p5-pgi/arch-linux2-c-opt/include
-I/home/mecfd/common/openmpi-p/include
-I/home/mecfd/common/sw/openmpi-1.4.4-pgi/include
-----------------------------------------

Using C linker: /home/mecfd/common/openmpi-p/bin/mpicc
Using Fortran linker: /home/mecfd/common/openmpi-p/bin/mpif90
Using libraries:
-Wl,-rpath,/home/mecfd/common/sw/petsc-3.2-p5-pgi/arch-linux2-c-opt/lib
-L/home/mecfd/common/sw/petsc-3.2-p5-pgi/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/home/mecfd/common/sw/petsc-3.2-p5-pgi/arch-linux2-c-opt/lib
-L/home/mecfd/common/sw/petsc-3.2-p5-pgi/arch-linux2-c-opt/lib
-lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
-lpord -lparmetis -lmetis -lHYPRE
-Wl,-rpath,/home/mecfd/common/sw/openmpi-1.4.4-pgi/lib
-Wl,-rpath,/local/linux-local/pgi/linux86-64/10.0/libso
-Wl,-rpath,/local/linux-local/pgi/linux86-64/10.0/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lmpi_cxx -lstd -lC
-lspooles -lscalapack -lblacs -lflapack -lfblas
/local/linux-local/pgi/linux86-64/10.0/lib/pgi.ld
-L/home/mecfd/common/sw/openmpi-1.4.4-pgi/lib
-L/local/linux-local/pgi/linux86-64/10.0/libso
-L/local/linux-local/pgi/linux86-64/10.0/lib
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte
-lopen-pal -lnsl -lutil -lpthread -lnspgc -lpgc -lmpi_f90 -lmpi_f77
-lpgf90 -lpgf90_rpm1 -lpgf902 -lpgftnrtl -lpgf90rtl -lrt -lm -lmpi_cxx
-lstd -lC -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lpthread
-lnspgc -lpgc -ldl
-----------------------------------------

--
With Best Regards;

Quoting Barry Smith <bsmith at mcs.anl.gov>:

>
>   Please run with the additional options -ksp_max_it 500
> -ksp_gmres_restart 500 -ksp_monitor_true_residual
> -ksp_monitor_singular_value and send back all the output (that would
> include the 500 residual norms as it tries to converge.)
>
>   Barry
>
> On Apr 28, 2014, at 1:21 PM, Foad Hassaninejadfarahani
> <umhassa5 at cc.umanitoba.ca> wrote:
>
>> Hello Again;
>>
>> I used -ksp_rtol 1.e-12 and it took way way longer to get the
>> result for one iteration and it did not converge:
>>
>> Linear solve did not converge due to DIVERGED_ITS iterations 10000
>> KSP Object: 8 MPI processes
>>  type: gmres
>>    GMRES: restart=300, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>    GMRES: happy breakdown tolerance 1e-30
>>  maximum iterations=10000, initial guess is zero
>>  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 8 MPI processes
>>  type: asm
>>    Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1
>>    Additive Schwarz: restriction/interpolation type - RESTRICT
>>    Local solve is same for all blocks, in the following KSP and PC objects:
>>  KSP Object:  (sub_)   1 MPI processes
>>    type: preonly
>>    maximum iterations=10000, initial guess is zero
>>    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>    left preconditioning
>>    using NONE norm type for convergence test
>>  PC Object:  (sub_)   1 MPI processes
>>    type: lu
>>      LU: out-of-place factorization
>>      tolerance for zero pivot 1e-12
>>      matrix ordering: nd
>>      factor fill ratio given 5, needed 3.70575
>>        Factored matrix follows:
>>          Matrix Object:           1 MPI processes
>>            type: seqaij
>>            rows=5630, cols=5630
>>            package used to perform factorization: petsc
>>            total: nonzeros=877150, allocated nonzeros=877150
>>            total number of mallocs used during MatSetValues calls =0
>>              using I-node routines: found 1126 nodes, limit used is 5
>>    linear system matrix = precond matrix:
>>    Matrix Object:     1 MPI processes
>>      type: seqaij
>>      rows=5630, cols=5630
>>      total: nonzeros=236700, allocated nonzeros=236700
>>      total number of mallocs used during MatSetValues calls =0
>>        using I-node routines: found 1126 nodes, limit used is 5
>>  linear system matrix = precond matrix:
>>  Matrix Object:   8 MPI processes
>>    type: mpiaij
>>    rows=41000, cols=41000
>>    total: nonzeros=1817800, allocated nonzeros=2555700
>>    total number of mallocs used during MatSetValues calls =121180
>>      using I-node (on process 0) routines: found 1025 nodes, limit used is 5
>>
>>
>> Well, let me clear everything. I am solving the whole system (air
>> and water) coupled at once. Although originally the system is not
>> linear, but I linearized the equations, so I have some lagged
>> terms. In addition the interface (between two phases) location is
>> wrong at the beginning and should be corrected in each iteration
>> after getting the solution. Therefore, I solve the whole domain,
>> move the interface and again solve the whole domain. This should
>> continue until the interface movement becomes from the order of
>> 1E-12.
>>
>> My problem is after getting the converged solution. Restarting from
>> the converged solution, if I use Superlu, it gives me back the
>> converged solution and stops after one iteration. But, if I use any
>> iterative solver, it does not give me back the converged solution
>> and starts moving the interface cause the wrong solution ask for
>> new interface location. This leads to oscillation for ever and for
>> some cases divergence.
>>
>> --
>> With Best Regards;
>>
>>
>> Quoting Barry Smith <bsmith at mcs.anl.gov>:
>>
>>>
>>> On Apr 28, 2014, at 12:59 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>>
>>>> First try a much tighter tolerance on the linear solver. Use
>>>> -ksp_rtol 1.e-12
>>>>
>>>> I don?t fully understand. Is the coupled system nonlinear? Are
>>>> you solving a nonlinear system, how are you doing that since you
>>>> seem to be only solving a single linear system? Does the linear
>>>> system involve all unknowns in the fluid and air?
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>> On Apr 28, 2014, at 11:19 AM, Foad Hassaninejadfarahani
>>>> <umhassa5 at cc.umanitoba.ca> wrote:
>>>>
>>>>> Hello PETSc team;
>>>>>
>>>>> The PETSc setup in my code is working now. I have issues with
>>>>> using the iterative solver instead of direct solver.
>>>>>
>>>>> I am solving a 2D, two-phase flow. Two fluids (air and water)
>>>>> flow into a channel and there is interaction between two phases.
>>>>> I am solving for the velocities in x and y directions, pressure
>>>>> and two scalars. They are all coupled together. I am looking for
>>>>> the steady-state solution. Since there is interface between the
>>>>> phases which needs updating, there are many iterations to reach
>>>>> the steady-state solution. "A" is a nine-banded non-symmetric
>>>>> matrix and each node has five unknowns. I am storing the
>>>>> non-zero coefficients and their locations in three separate
>>>>> vectors.
>>>>>
>>>>> I started using the direct solver. Superlu works fine and gives
>>>>> me good results compared to the previous works. However it is
>>>>> not cheap and applicable for fine grids. But, the iterative
>>>>> solver did not work and here is what I did:
>>>>>
>>>>> I got the converged solution by using Superlu. After that I
>>>>> restarted from the converged solution and did one iteration
>>>>> using  -pc_type lu -pc_factor_mat_solver_package superlu_dist
>>>>> -log_summary. Again, it gave me the same converged solution.
>>>>>
>>>>> After that I started from the converged solution once more and
>>>>> this time I tried different combinations of iterative solvers
>>>>> and preconditions like the followings:
>>>>> -ksp_type gmres -ksp_gmres_restart 300 -pc_type asm -sub_pc_type
>>>>> lu ksp_monitor_true_residual -ksp_converged_reason -ksp_view
>>>>> -log_summary
>>>>>
>>>>> and here is the report:
>>>>> Linear solve converged due to CONVERGED_RTOL iterations 41
>>>>> KSP Object: 8 MPI processes
>>>>> type: gmres
>>>>>  GMRES: restart=300, using Classical (unmodified) Gram-Schmidt
>>>>> Orthogonalization with no iterative refinement
>>>>>  GMRES: happy breakdown tolerance 1e-30
>>>>> maximum iterations=10000, initial guess is zero
>>>>> tolerances:  relative=1e-06, absolute=1e-50, divergence=10000
>>>>> left preconditioning
>>>>> using PRECONDITIONED norm type for convergence test
>>>>> PC Object: 8 MPI processes
>>>>> type: asm
>>>>>  Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1
>>>>>  Additive Schwarz: restriction/interpolation type - RESTRICT
>>>>>  Local solve is same for all blocks, in the following KSP and PC objects:
>>>>> KSP Object:  (sub_)   1 MPI processes
>>>>>  type: preonly
>>>>>  maximum iterations=10000, initial guess is zero
>>>>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>  left preconditioning
>>>>>  using NONE norm type for convergence test
>>>>> PC Object:  (sub_)   1 MPI processes
>>>>>  type: lu
>>>>>    LU: out-of-place factorization
>>>>>    tolerance for zero pivot 1e-12
>>>>>    matrix ordering: nd
>>>>>    factor fill ratio given 5, needed 3.70575
>>>>>      Factored matrix follows:
>>>>>        Matrix Object:           1 MPI processes
>>>>>          type: seqaij
>>>>>          rows=5630, cols=5630
>>>>>          package used to perform factorization: petsc
>>>>>          total: nonzeros=877150, allocated nonzeros=877150
>>>>>          total number of mallocs used during MatSetValues calls =0
>>>>>            using I-node routines: found 1126 nodes, limit used is 5
>>>>>  linear system matrix = precond matrix:
>>>>>  Matrix Object:     1 MPI processes
>>>>>    type: seqaij
>>>>>    rows=5630, cols=5630
>>>>>    total: nonzeros=236700, allocated nonzeros=236700
>>>>>    total number of mallocs used during MatSetValues calls =0
>>>>>      using I-node routines: found 1126 nodes, limit used is 5
>>>>> linear system matrix = precond matrix:
>>>>> Matrix Object:   8 MPI processes
>>>>>  type: mpiaij
>>>>>  rows=41000, cols=41000
>>>>>  total: nonzeros=1817800, allocated nonzeros=2555700
>>>>>  total number of mallocs used during MatSetValues calls =121180
>>>>>    using I-node (on process 0) routines: found 1025 nodes, limit
>>>>> used is 5
>>>>>
>>>>> But, the results are far from the converged solution. For
>>>>> example two reference nodes for the pressure are compared:
>>>>>
>>>>> Based on Superlu
>>>>> Channel Inlet pressure (MIXTURE):      0.38890D-01
>>>>> Channel Inlet pressure (LIQUID):       0.38416D-01
>>>>>
>>>>> Based on Gmres
>>>>> Channel Inlet pressure (MIXTURE):     -0.87214D+00
>>>>> Channel Inlet pressure (LIQUID):      -0.87301D+00
>>>>>
>>>>>
>>>>> I also tried this:
>>>>> -ksp_type gcr -pc_type asm -ksp_diagonal_scale
>>>>> -ksp_diagonal_scale_fix -ksp_monitor_true_residual
>>>>> -ksp_converged_reason -ksp_view -log_summary
>>>>>
>>>>> and here is the report:
>>>>> 0 KSP unpreconditioned resid norm 2.248340888101e+05 true resid
>>>>> norm 2.248340888101e+05 ||r(i)||/||b|| 1.000000000000e+00
>>>>> 1 KSP unpreconditioned resid norm 4.900010460179e+04 true resid
>>>>> norm 4.900010460179e+04 ||r(i)||/||b|| 2.179389471637e-01
>>>>> 2 KSP unpreconditioned resid norm 4.267761572746e+04 true resid
>>>>> norm 4.267761572746e+04 ||r(i)||/||b|| 1.898182608933e-01
>>>>> 3 KSP unpreconditioned resid norm 2.041242251471e+03 true resid
>>>>> norm 2.041242251471e+03 ||r(i)||/||b|| 9.078882398457e-03
>>>>> 4 KSP unpreconditioned resid norm 1.852885420564e+03 true resid
>>>>> norm 1.852885420564e+03 ||r(i)||/||b|| 8.241123178296e-03
>>>>> 5 KSP unpreconditioned resid norm 1.748965594395e+02 true resid
>>>>> norm 1.748965594395e+02 ||r(i)||/||b|| 7.778916460804e-04
>>>>> 6 KSP unpreconditioned resid norm 5.664539353996e+01 true resid
>>>>> norm 5.664539353996e+01 ||r(i)||/||b|| 2.519430831852e-04
>>>>> 7 KSP unpreconditioned resid norm 3.607535692806e+01 true resid
>>>>> norm 3.607535692806e+01 ||r(i)||/||b|| 1.604532351788e-04
>>>>> 8 KSP unpreconditioned resid norm 1.041501303366e+01 true resid
>>>>> norm 1.041501303366e+01 ||r(i)||/||b|| 4.632310468924e-05
>>>>> 9 KSP unpreconditioned resid norm 3.089920380322e+00 true resid
>>>>> norm 3.089920380322e+00 ||r(i)||/||b|| 1.374311340720e-05
>>>>> 10 KSP unpreconditioned resid norm 1.456883209806e+00 true resid
>>>>> norm 1.456883209806e+00 ||r(i)||/||b|| 6.479814593583e-06
>>>>> 11 KSP unpreconditioned resid norm 5.566902714391e-01 true resid
>>>>> norm 5.566902714391e-01 ||r(i)||/||b|| 2.476004748147e-06
>>>>> 12 KSP unpreconditioned resid norm 2.403913756663e-01 true resid
>>>>> norm 2.403913756663e-01 ||r(i)||/||b|| 1.069194520006e-06
>>>>> 13 KSP unpreconditioned resid norm 1.650435118839e-01 true resid
>>>>> norm 1.650435118839e-01 ||r(i)||/||b|| 7.340680088032e-07
>>>>> Linear solve converged due to CONVERGED_RTOL iterations 13
>>>>> KSP Object: 8 MPI processes
>>>>> type: gcr
>>>>>  GCR: restart = 30
>>>>>  GCR: restarts performed = 1
>>>>> maximum iterations=10000, initial guess is zero
>>>>> tolerances:  relative=1e-06, absolute=1e-50, divergence=10000
>>>>> right preconditioning
>>>>> diagonally scaled system
>>>>> using UNPRECONDITIONED norm type for convergence test
>>>>> PC Object: 8 MPI processes
>>>>> type: asm
>>>>>  Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1
>>>>>  Additive Schwarz: restriction/interpolation type - RESTRICT
>>>>>  Local solve is same for all blocks, in the following KSP and PC objects:
>>>>> KSP Object:  (sub_)   1 MPI processes
>>>>>  type: preonly
>>>>>  maximum iterations=10000, initial guess is zero
>>>>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>  left preconditioning
>>>>>  using NONE norm type for convergence test
>>>>> PC Object:  (sub_)   1 MPI processes
>>>>>  type: ilu
>>>>>    ILU: out-of-place factorization
>>>>>    0 levels of fill
>>>>>    tolerance for zero pivot 1e-12
>>>>>    using diagonal shift to prevent zero pivot
>>>>>    matrix ordering: natural
>>>>>    factor fill ratio given 1, needed 1
>>>>>      Factored matrix follows:
>>>>>        Matrix Object:           1 MPI processes
>>>>>          type: seqaij
>>>>>          rows=5630, cols=5630
>>>>>          package used to perform factorization: petsc
>>>>>          total: nonzeros=236700, allocated nonzeros=236700
>>>>>          total number of mallocs used during MatSetValues calls =0
>>>>>            using I-node routines: found 1126 nodes, limit used is 5
>>>>>  linear system matrix = precond matrix:
>>>>>  Matrix Object:     1 MPI processes
>>>>>    type: seqaij
>>>>>    rows=5630, cols=5630
>>>>>    total: nonzeros=236700, allocated nonzeros=236700
>>>>>    total number of mallocs used during MatSetValues calls =0
>>>>>      using I-node routines: found 1126 nodes, limit used is 5
>>>>> linear system matrix = precond matrix:
>>>>> Matrix Object:   8 MPI processes
>>>>>  type: mpiaij
>>>>>  rows=41000, cols=41000
>>>>>  total: nonzeros=1817800, allocated nonzeros=2555700
>>>>>  total number of mallocs used during MatSetValues calls =121180
>>>>>    using I-node (on process 0) routines: found 1025 nodes, limit
>>>>> used is 5
>>>>>
>>>>> Channel Inlet pressure (MIXTURE):      -0.90733D+00
>>>>> Channel Inlet pressure (LIQUID):      -0.10118D+01
>>>>>
>>>>>
>>>>> As you may see these are complete different results which are
>>>>> not close to the converged solution.
>>>>>
>>>>> Since, I want to have fine grids I need to use iterative solver.
>>>>> I wonder if I am missing something or using wrong
>>>>> solver/precondition/option. I would appreciate if you could help
>>>>> me (like always).
>>>>>
>>>>> --
>>>>> With Best Regards;
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>

