[petsc-users] Question concerning ilu and bcgs
Sun, Hui
hus003 at ucsd.edu
Wed Feb 18 12:51:58 CST 2015
Thank you Dave. In fact I have tried fieldsplit several months ago, and today I go back to the previous code and ran it again. How can I tell it is doing what I want it to do? Here are the options:
-pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type jacobi -pc_fieldsplit_type SC\
HUR -ksp_monitor_short -ksp_converged_reason -ksp_rtol 1e-4 -fieldsplit_1_ksp_rtol 1e-2 -fieldsplit_0_ksp_rtol 1e-4 -fieldsplit_1_ksp_max_it 10 -fieldsplit_0_ksp_max_it 10 -ksp_type fgmres -ksp_max_it 10 -ksp_view
And here is the output:
Starting...
0 KSP Residual norm 17.314
1 KSP Residual norm 10.8324
2 KSP Residual norm 10.8312
3 KSP Residual norm 10.7726
4 KSP Residual norm 10.7642
5 KSP Residual norm 10.7634
6 KSP Residual norm 10.7399
7 KSP Residual norm 10.7159
8 KSP Residual norm 10.6602
9 KSP Residual norm 10.5756
10 KSP Residual norm 10.5224
Linear solve did not converge due to DIVERGED_ITS iterations 10
KSP Object: 1 MPI processes
type: fgmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: fieldsplit
FieldSplit with Schur preconditioner, factorization FULL
Preconditioner for the Schur complement formed from A11
Split info:
Split number 0 Defined by IS
Split number 1 Defined by IS
KSP solver for A00 block
KSP Object: (fieldsplit_0_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_0_) 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: (fieldsplit_0_) 1 MPI processes
type: mpiaij
rows=20000, cols=20000
total: nonzeros=85580, allocated nonzeros=760000
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
KSP solver for S = A11 - A10 inv(A00) A01
KSP Object: (fieldsplit_1_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=0.01, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_1_) 1 MPI processes
type: jacobi
linear system matrix followed by preconditioner matrix:
Mat Object: (fieldsplit_1_) 1 MPI processes
type: schurcomplement
rows=10000, cols=10000
Schur complement A11 - A10 inv(A00) A01
A11
Mat Object: (fieldsplit_1_) 1 MPI processes
type: mpiaij
rows=10000, cols=10000
total: nonzeros=2110, allocated nonzeros=80000
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 3739 nodes, limit used is 5
A10
Mat Object: (a10_) 1 MPI processes
type: mpiaij
rows=10000, cols=20000
total: nonzeros=31560, allocated nonzeros=80000
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
KSP of A00
KSP Object: (fieldsplit_0_) 1 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=0.0001, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_0_) 1 MPI processes
type: jacobi
linear system matrix = precond matrix:
Mat Object: (fieldsplit_0_) 1 MPI processes
type: mpiaij
rows=20000, cols=20000
total: nonzeros=85580, allocated nonzeros=760000
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
A01
Mat Object: (a01_) 1 MPI processes
type: mpiaij
rows=20000, cols=10000
total: nonzeros=32732, allocated nonzeros=240000
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
Mat Object: (fieldsplit_1_) 1 MPI processes
type: mpiaij
rows=10000, cols=10000
total: nonzeros=2110, allocated nonzeros=80000
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 3739 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: nest
rows=30000, cols=30000
Matrix object:
type=nest, rows=2, cols=2
MatNest structure:
(0,0) : prefix="fieldsplit_0_", type=mpiaij, rows=20000, cols=20000
(0,1) : prefix="a01_", type=mpiaij, rows=20000, cols=10000
(1,0) : prefix="a10_", type=mpiaij, rows=10000, cols=20000
(1,1) : prefix="fieldsplit_1_", type=mpiaij, rows=10000, cols=10000
residual u = 10.3528
residual p = 1.88199
residual [u,p] = 10.5224
L^2 discretization error u = 0.698386
L^2 discretization error p = 1.0418
L^2 discretization error [u,p] = 1.25423
number of processors = 1 0
Time cost for creating solver context 0.100217 s, and for solving 3.78879 s, and for printing 0.0908558 s.
________________________________
From: Dave May [dave.mayhem23 at gmail.com]
Sent: Wednesday, February 18, 2015 10:00 AM
To: Sun, Hui
Cc: Matthew Knepley; petsc-users at mcs.anl.gov; hong at aspiritech.org
Subject: Re: [petsc-users] Question concerning ilu and bcgs
Fieldsplit will not work if you just set pc_type fieldsplit and you have an operator with a block size if 1. In this case, you will need to define the splits using index sets.
I cannot believe that defining all the v and p dofs is really hard. Certainly it is far easier than trying to understand the difference between the petsc, matlab and the hypre implementations of ilut. Even if you did happen to find one implemtation of ilu you were "happy" with, as soon as you refine the mesh a couple of times the iterations will increase.
I second Matt's opinion - forget about ilu and focus time on trying to make fieldsplit work. Fieldsplit will generate spectrally equivalent operators of your flow problem, ilu won't
Cheers
Dave
On Wednesday, 18 February 2015, Sun, Hui <hus003 at ucsd.edu<mailto:hus003 at ucsd.edu>> wrote:
I tried fieldsplitting several months ago, it didn't work due to the complicated coupled irregular bdry conditions. So I tried direct solver and now I modified the PDE system a little bit so that the ILU/bcgs works in MATLAB. But thank you for the suggestions, although I doubt it would work, maybe I will still try fieldsplitting with my new system.
________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 8:54 AM
To: Sun, Hui
Cc: hong at aspiritech.org<UrlBlockedError.aspx>; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs
On Wed, Feb 18, 2015 at 10:47 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
The matrix is from a 3D fluid problem, with complicated irregular boundary conditions. I've tried using direct solvers such as UMFPACK, SuperLU_dist and MUMPS. It seems that SuperLU_dist does not solve for my linear system; UMFPACK solves the system but would run into memory issue even with small size matrices and it cannot parallelize; MUMPS does solve the system but it also fails when the size is big and it takes much time. That's why I'm seeking an iterative method.
I guess the direct method is faster than an iterative method for a small A, but that may not be true for bigger A.
If this is a Stokes flow, you should use PCFIELDSPLIT and multigrid. If it is advection dominated, I know of nothing better
than sparse direct or perhaps Block-Jacobi with sparse direct blocks. Since MUMPS solved your system, I would consider
using BJacobi/ASM and MUMPS or UMFPACK as the block solver.
Thanks,
Matt
________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 8:33 AM
To: Sun, Hui
Cc: hong at aspiritech.org<UrlBlockedError.aspx>; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs
On Wed, Feb 18, 2015 at 10:31 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
So far I just try around, I haven't looked into literature yet.
However, both MATLAB's ilu+gmres and ilu+bcgs work. Is it possible that some parameter or options need to be tuned in using PETSc's ilu or hypre's ilu? Besides, is there a way to view how good the performance of the pc is and output the matrices L and U, so that I can do some test in MATLAB?
1) Its not clear exactly what Matlab is doing
2) PETSc uses ILU(0) by default (you can set it to use ILU(k))
3) I don't know what Hypre's ILU can do
I would really discourage from using ILU. I cannot imagine it is faster than sparse direct factorization
for your system, such as from SuperLU or MUMPS.
Thanks,
Matt
Hui
________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 8:09 AM
To: Sun, Hui
Cc: hong at aspiritech.org<UrlBlockedError.aspx>; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs
On Wed, Feb 18, 2015 at 10:02 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
Yes I've tried other solvers, gmres/ilu does not work, neither does bcgs/ilu. Here are the options:
-pc_type ilu -pc_factor_nonzeros_along_diagonal -pc_factor_levels 0 -pc_factor_reuse_ordering -ksp_ty\
pe bcgs -ksp_rtol 1e-6 -ksp_max_it 10 -ksp_monitor_short -ksp_view
Note here that ILU(0) is an unreliable and generally crappy preconditioner. Have you looked in the
literature for the kinds of preconditioners that are effective for your problem?
Thanks,
Matt
Here is the output:
0 KSP Residual norm 211292
1 KSP Residual norm 13990.2
2 KSP Residual norm 9870.08
3 KSP Residual norm 9173.9
4 KSP Residual norm 9121.94
5 KSP Residual norm 7386.1
6 KSP Residual norm 6222.55
7 KSP Residual norm 7192.94
8 KSP Residual norm 33964
9 KSP Residual norm 33960.4
10 KSP Residual norm 1068.54
KSP Object: 1 MPI processes
type: bcgs
maximum iterations=10, initial guess is zero
tolerances: relative=1e-06, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: ilu
ILU: out-of-place factorization
ILU: Reusing reordering from past factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: natural
factor fill ratio given 1, needed 1
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=62500, cols=62500
package used to perform factorization: petsc
total: nonzeros=473355, allocated nonzeros=473355
total number of mallocs used during MatSetValues calls =0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=62500, cols=62500
total: nonzeros=473355, allocated nonzeros=7.8125e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Time cost: 0.307149, 0.268402, 0.0990018
________________________________
From: hong at aspiritech.org<UrlBlockedError.aspx> [hong at aspiritech.org<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 7:49 AM
To: Sun, Hui
Cc: Matthew Knepley; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs
Have you tried other solvers, e.g., PETSc default gmres/ilu, bcgs/ilu etc.
The matrix is small. If it is ill-conditioned, then pc_type lu would work the best.
Hong
On Wed, Feb 18, 2015 at 9:34 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
With options:
-pc_type hypre -pc_hypre_type pilut -pc_hypre_pilut_maxiter 1000 -pc_hypre_pilut_tol 1e-3 -ksp_type bcgs -ksp_rtol 1e-10 -ksp_max_it 10 -ksp_monitor_short -ksp_converged_reason -ksp_view
Here is the full output:
0 KSP Residual norm 1404.62
1 KSP Residual norm 88.9068
2 KSP Residual norm 64.73
3 KSP Residual norm 71.0224
4 KSP Residual norm 69.5044
5 KSP Residual norm 455.458
6 KSP Residual norm 174.876
7 KSP Residual norm 183.031
8 KSP Residual norm 650.675
9 KSP Residual norm 79.2441
10 KSP Residual norm 84.1985
Linear solve did not converge due to DIVERGED_ITS iterations 10
KSP Object: 1 MPI processes
type: bcgs
maximum iterations=10, initial guess is zero
tolerances: relative=1e-10, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: hypre
HYPRE Pilut preconditioning
HYPRE Pilut: maximum number of iterations 1000
HYPRE Pilut: drop tolerance 0.001
HYPRE Pilut: default factor row size
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=62500, cols=62500
total: nonzeros=473355, allocated nonzeros=7.8125e+06
total number of mallocs used during MatSetValues calls =0
not using I-node routines
Time cost: 0.756198, 0.662984, 0.105672
________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 3:30 AM
To: Sun, Hui
Cc: petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs
On Wed, Feb 18, 2015 at 12:33 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
I have a matrix system Ax = b, A is of type MatSeqAIJ or MatMPIAIJ, depending on the number of cores.
I try to solve this problem by pc_type ilu and ksp_type bcgs, it does not converge. The options I specify are:
-pc_type hypre -pc_hypre_type pilut -pc_hypre_pilut_maxiter 1000 -pc_hypre_pilut_tol 1e-3 -ksp_type b\
cgs -ksp_rtol 1e-10 -ksp_max_it 1000 -ksp_monitor_short -ksp_converged_reason
1) Run with -ksp_view, so we can see exactly what was used
2) ILUT is unfortunately not a well-defined algorithm, and I believe the parallel version makes different decisions
than the serial version.
Thanks,
Matt
The first a few lines of the output are:
0 KSP Residual norm 1404.62
1 KSP Residual norm 88.9068
2 KSP Residual norm 64.73
3 KSP Residual norm 71.0224
4 KSP Residual norm 69.5044
5 KSP Residual norm 455.458
6 KSP Residual norm 174.876
7 KSP Residual norm 183.031
8 KSP Residual norm 650.675
9 KSP Residual norm 79.2441
10 KSP Residual norm 84.1985
This clearly indicates non-convergence. However, I output the sparse matrix A and vector b to MATLAB, and run the following command:
[L,U] = ilu(A,struct('type','ilutp','droptol',1e-3));
[ux1,fl1,rr1,it1,rv1] = bicgstab(A,b,1e-10,1000,L,U);
And it converges in MATLAB, with flag fl1=0, relative residue rr1=8.2725e-11, and iteration it1=89.5. I'm wondering how can I figure out what's wrong.
Best,
Hui
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150218/1db3c046/attachment-0001.html>
More information about the petsc-users
mailing list