[petsc-users] Question concerning ilu and bcgs

Wed Feb 18 12:51:58 CST 2015

Thank you Dave. In fact I have tried fieldsplit several months ago, and today I go back to the previous code and ran it again. How can I tell it is doing what I want it to do? Here are the options:

-pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type jacobi  -pc_fieldsplit_type SC\

HUR -ksp_monitor_short -ksp_converged_reason -ksp_rtol 1e-4 -fieldsplit_1_ksp_rtol 1e-2 -fieldsplit_0_ksp_rtol 1e-4 -fieldsplit_1_ksp_max_it 10 -fieldsplit_0_ksp_max_it 10 -ksp_type fgmres -ksp_max_it 10 -ksp_view

And here is the output:

Starting...

  0 KSP Residual norm 17.314

  1 KSP Residual norm 10.8324

  2 KSP Residual norm 10.8312

  3 KSP Residual norm 10.7726

  4 KSP Residual norm 10.7642

  5 KSP Residual norm 10.7634

  6 KSP Residual norm 10.7399

  7 KSP Residual norm 10.7159

  8 KSP Residual norm 10.6602

  9 KSP Residual norm 10.5756

 10 KSP Residual norm 10.5224

Linear solve did not converge due to DIVERGED_ITS iterations 10

KSP Object: 1 MPI processes

  type: fgmres

    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

    GMRES: happy breakdown tolerance 1e-30

  maximum iterations=10, initial guess is zero

  tolerances:  relative=0.0001, absolute=1e-50, divergence=10000

  right preconditioning

  using UNPRECONDITIONED norm type for convergence test

PC Object: 1 MPI processes

  type: fieldsplit

    FieldSplit with Schur preconditioner, factorization FULL

    Preconditioner for the Schur complement formed from A11

    Split info:

    Split number 0 Defined by IS

    Split number 1 Defined by IS

    KSP solver for A00 block

      KSP Object:      (fieldsplit_0_)       1 MPI processes

        type: gmres

          GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

          GMRES: happy breakdown tolerance 1e-30

        maximum iterations=10, initial guess is zero

        tolerances:  relative=0.0001, absolute=1e-50, divergence=10000

        left preconditioning

        using PRECONDITIONED norm type for convergence test

      PC Object:      (fieldsplit_0_)       1 MPI processes

        type: jacobi

        linear system matrix = precond matrix:

        Mat Object:        (fieldsplit_0_)         1 MPI processes

          type: mpiaij

          rows=20000, cols=20000

          total: nonzeros=85580, allocated nonzeros=760000

          total number of mallocs used during MatSetValues calls =0

            not using I-node (on process 0) routines

    KSP solver for S = A11 - A10 inv(A00) A01

      KSP Object:      (fieldsplit_1_)       1 MPI processes

        type: gmres

          GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

          GMRES: happy breakdown tolerance 1e-30

        maximum iterations=10, initial guess is zero

        tolerances:  relative=0.01, absolute=1e-50, divergence=10000

        left preconditioning

        using PRECONDITIONED norm type for convergence test

      PC Object:      (fieldsplit_1_)       1 MPI processes

        type: jacobi

        linear system matrix followed by preconditioner matrix:

        Mat Object:        (fieldsplit_1_)         1 MPI processes

          type: schurcomplement

          rows=10000, cols=10000

            Schur complement A11 - A10 inv(A00) A01

            A11

              Mat Object:              (fieldsplit_1_)               1 MPI processes

                type: mpiaij

                rows=10000, cols=10000

                total: nonzeros=2110, allocated nonzeros=80000

                total number of mallocs used during MatSetValues calls =0

                  using I-node (on process 0) routines: found 3739 nodes, limit used is 5

            A10

              Mat Object:              (a10_)               1 MPI processes

                type: mpiaij

                rows=10000, cols=20000

                total: nonzeros=31560, allocated nonzeros=80000

                total number of mallocs used during MatSetValues calls =0

                  not using I-node (on process 0) routines

            KSP of A00

              KSP Object:              (fieldsplit_0_)               1 MPI processes

                type: gmres

                  GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement

                  GMRES: happy breakdown tolerance 1e-30

                maximum iterations=10, initial guess is zero

                tolerances:  relative=0.0001, absolute=1e-50, divergence=10000

                left preconditioning

                using PRECONDITIONED norm type for convergence test

              PC Object:              (fieldsplit_0_)               1 MPI processes

                type: jacobi

                linear system matrix = precond matrix:

                Mat Object:                (fieldsplit_0_)                 1 MPI processes

                  type: mpiaij

                  rows=20000, cols=20000

                  total: nonzeros=85580, allocated nonzeros=760000

                  total number of mallocs used during MatSetValues calls =0

                    not using I-node (on process 0) routines

            A01

              Mat Object:              (a01_)               1 MPI processes

                type: mpiaij

                rows=20000, cols=10000

                total: nonzeros=32732, allocated nonzeros=240000

                total number of mallocs used during MatSetValues calls =0

                  not using I-node (on process 0) routines

        Mat Object:        (fieldsplit_1_)         1 MPI processes

          type: mpiaij

          rows=10000, cols=10000

          total: nonzeros=2110, allocated nonzeros=80000

          total number of mallocs used during MatSetValues calls =0

            using I-node (on process 0) routines: found 3739 nodes, limit used is 5

  linear system matrix = precond matrix:

  Mat Object:   1 MPI processes

    type: nest

    rows=30000, cols=30000

      Matrix object:

        type=nest, rows=2, cols=2

        MatNest structure:

        (0,0) : prefix="fieldsplit_0_", type=mpiaij, rows=20000, cols=20000

        (0,1) : prefix="a01_", type=mpiaij, rows=20000, cols=10000

        (1,0) : prefix="a10_", type=mpiaij, rows=10000, cols=20000

        (1,1) : prefix="fieldsplit_1_", type=mpiaij, rows=10000, cols=10000

 residual u = 10.3528

 residual p = 1.88199

 residual [u,p] = 10.5224

 L^2 discretization error u = 0.698386

 L^2 discretization error p = 1.0418

 L^2 discretization error [u,p] = 1.25423

 number of processors = 1 0

Time cost for creating solver context 0.100217 s, and for solving 3.78879 s, and for printing 0.0908558 s.

________________________________
From: Dave May [dave.mayhem23 at gmail.com]
Sent: Wednesday, February 18, 2015 10:00 AM
To: Sun, Hui
Cc: Matthew Knepley; petsc-users at mcs.anl.gov; hong at aspiritech.org
Subject: Re: [petsc-users] Question concerning ilu and bcgs

Fieldsplit will not work if you just set pc_type fieldsplit and you have an operator with a block size if 1. In this case, you will need to define the splits using index sets.

I cannot believe that defining all the v and p dofs is really hard. Certainly it is far easier than trying to understand the difference between the petsc, matlab and the hypre implementations of ilut. Even if you did happen to find one implemtation of ilu you were "happy" with, as soon as you refine the mesh a couple of times the iterations will  increase.

I second Matt's opinion - forget about ilu and focus time on trying to make fieldsplit work. Fieldsplit will generate spectrally equivalent operators of your flow problem, ilu won't

Cheers
  Dave

On Wednesday, 18 February 2015, Sun, Hui <hus003 at ucsd.edu<mailto:hus003 at ucsd.edu>> wrote:
I tried fieldsplitting several months ago, it didn't work due to the complicated coupled irregular bdry conditions. So I tried direct solver and now I modified the PDE system a little bit so that the ILU/bcgs works in MATLAB. But thank you for the suggestions, although I doubt it would work, maybe I will still try fieldsplitting with my new system.

________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 8:54 AM
To: Sun, Hui
Cc: hong at aspiritech.org<UrlBlockedError.aspx>; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs

On Wed, Feb 18, 2015 at 10:47 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
The matrix is from a 3D fluid problem, with complicated irregular boundary conditions. I've tried using direct solvers such as UMFPACK, SuperLU_dist and MUMPS. It seems that SuperLU_dist does not solve for my linear system; UMFPACK solves the system but would run into memory issue even with small size matrices and it cannot parallelize; MUMPS does solve the system but it also fails when the size is big and it takes much time. That's why I'm seeking an iterative method.

I guess the direct method is faster than an iterative method for a small A, but that may not be true for bigger A.

If this is a Stokes flow, you should use PCFIELDSPLIT and multigrid. If it is advection dominated, I know of nothing better
than sparse direct or perhaps Block-Jacobi with sparse direct blocks. Since MUMPS solved your system, I would consider
using BJacobi/ASM and MUMPS or UMFPACK as the block solver.

  Thanks,

     Matt

________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 8:33 AM
To: Sun, Hui
Cc: hong at aspiritech.org<UrlBlockedError.aspx>; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs

On Wed, Feb 18, 2015 at 10:31 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
So far I just try around, I haven't looked into literature yet.

However, both MATLAB's ilu+gmres and ilu+bcgs work. Is it possible that some parameter or options need to be tuned in using PETSc's ilu or hypre's ilu? Besides, is there a way to view how good the performance of the pc is and output the matrices L and U, so that I can do some test in MATLAB?

1) Its not clear exactly what Matlab is doing

2) PETSc uses ILU(0) by default (you can set it to use ILU(k))

3) I don't know what Hypre's ILU can do

I would really discourage from using ILU. I cannot imagine it is faster than sparse direct factorization
for your system, such as from SuperLU or MUMPS.

  Thanks,

     Matt

Hui

________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 8:09 AM
To: Sun, Hui
Cc: hong at aspiritech.org<UrlBlockedError.aspx>; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs

On Wed, Feb 18, 2015 at 10:02 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
Yes I've tried other solvers, gmres/ilu does not work, neither does bcgs/ilu. Here are the options:

-pc_type ilu -pc_factor_nonzeros_along_diagonal -pc_factor_levels 0 -pc_factor_reuse_ordering -ksp_ty\

pe bcgs -ksp_rtol 1e-6 -ksp_max_it 10 -ksp_monitor_short -ksp_view

Note here that ILU(0) is an unreliable and generally crappy preconditioner. Have you looked in the
literature for the kinds of preconditioners that are effective for your problem?

  Thanks,

     Matt

Here is the output:

  0 KSP Residual norm 211292

  1 KSP Residual norm 13990.2

  2 KSP Residual norm 9870.08

  3 KSP Residual norm 9173.9

  4 KSP Residual norm 9121.94

  5 KSP Residual norm 7386.1

  6 KSP Residual norm 6222.55

  7 KSP Residual norm 7192.94

  8 KSP Residual norm 33964

  9 KSP Residual norm 33960.4

 10 KSP Residual norm 1068.54

KSP Object: 1 MPI processes

  type: bcgs

  maximum iterations=10, initial guess is zero

  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000

  left preconditioning

  using PRECONDITIONED norm type for convergence test

PC Object: 1 MPI processes

  type: ilu

    ILU: out-of-place factorization

    ILU: Reusing reordering from past factorization

    0 levels of fill

    tolerance for zero pivot 2.22045e-14

    using diagonal shift on blocks to prevent zero pivot [INBLOCKS]

    matrix ordering: natural

    factor fill ratio given 1, needed 1

      Factored matrix follows:

        Mat Object:         1 MPI processes

          type: seqaij

          rows=62500, cols=62500

          package used to perform factorization: petsc

          total: nonzeros=473355, allocated nonzeros=473355

          total number of mallocs used during MatSetValues calls =0

            not using I-node routines

  linear system matrix = precond matrix:

  Mat Object:   1 MPI processes

    type: seqaij

    rows=62500, cols=62500

    total: nonzeros=473355, allocated nonzeros=7.8125e+06

    total number of mallocs used during MatSetValues calls =0

      not using I-node routines

Time cost: 0.307149,  0.268402,  0.0990018

________________________________
From: hong at aspiritech.org<UrlBlockedError.aspx> [hong at aspiritech.org<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 7:49 AM
To: Sun, Hui
Cc: Matthew Knepley; petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs

 Have you tried other solvers, e.g., PETSc default gmres/ilu, bcgs/ilu etc.
The matrix is small. If it is ill-conditioned, then pc_type lu would work the best.

Hong

On Wed, Feb 18, 2015 at 9:34 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
With options:

-pc_type hypre -pc_hypre_type pilut -pc_hypre_pilut_maxiter 1000 -pc_hypre_pilut_tol 1e-3 -ksp_type bcgs -ksp_rtol 1e-10 -ksp_max_it 10 -ksp_monitor_short -ksp_converged_reason -ksp_view

Here is the full output:

  0 KSP Residual norm 1404.62

  1 KSP Residual norm 88.9068

  2 KSP Residual norm 64.73

  3 KSP Residual norm 71.0224

  4 KSP Residual norm 69.5044

  5 KSP Residual norm 455.458

  6 KSP Residual norm 174.876

  7 KSP Residual norm 183.031

  8 KSP Residual norm 650.675

  9 KSP Residual norm 79.2441

 10 KSP Residual norm 84.1985

Linear solve did not converge due to DIVERGED_ITS iterations 10

KSP Object: 1 MPI processes

  type: bcgs

  maximum iterations=10, initial guess is zero

  tolerances:  relative=1e-10, absolute=1e-50, divergence=10000

  left preconditioning

  using PRECONDITIONED norm type for convergence test

PC Object: 1 MPI processes

  type: hypre

    HYPRE Pilut preconditioning

    HYPRE Pilut: maximum number of iterations 1000

    HYPRE Pilut: drop tolerance 0.001

    HYPRE Pilut: default factor row size

  linear system matrix = precond matrix:

  Mat Object:   1 MPI processes

    type: seqaij

    rows=62500, cols=62500

    total: nonzeros=473355, allocated nonzeros=7.8125e+06

    total number of mallocs used during MatSetValues calls =0

      not using I-node routines

Time cost: 0.756198,  0.662984,  0.105672

________________________________
From: Matthew Knepley [knepley at gmail.com<UrlBlockedError.aspx>]
Sent: Wednesday, February 18, 2015 3:30 AM
To: Sun, Hui
Cc: petsc-users at mcs.anl.gov<UrlBlockedError.aspx>
Subject: Re: [petsc-users] Question concerning ilu and bcgs

On Wed, Feb 18, 2015 at 12:33 AM, Sun, Hui <hus003 at ucsd.edu<UrlBlockedError.aspx>> wrote:
I have a matrix system Ax = b, A is of type MatSeqAIJ or MatMPIAIJ, depending on the number of cores.

I try to solve this problem by pc_type ilu and ksp_type bcgs, it does not converge. The options I specify are:

-pc_type hypre -pc_hypre_type pilut -pc_hypre_pilut_maxiter 1000 -pc_hypre_pilut_tol 1e-3 -ksp_type b\

cgs -ksp_rtol 1e-10 -ksp_max_it 1000 -ksp_monitor_short -ksp_converged_reason

1) Run with -ksp_view, so we can see exactly what was used

2) ILUT is unfortunately not a well-defined algorithm, and I believe the parallel version makes different decisions
    than the serial version.

  Thanks,

    Matt

The first a few lines of the output are:

  0 KSP Residual norm 1404.62

  1 KSP Residual norm 88.9068

  2 KSP Residual norm 64.73

  3 KSP Residual norm 71.0224

  4 KSP Residual norm 69.5044

  5 KSP Residual norm 455.458

  6 KSP Residual norm 174.876

  7 KSP Residual norm 183.031

  8 KSP Residual norm 650.675

  9 KSP Residual norm 79.2441

 10 KSP Residual norm 84.1985

This clearly indicates non-convergence. However, I output the sparse matrix A and vector b to MATLAB, and run the following command:

[L,U] = ilu(A,struct('type','ilutp','droptol',1e-3));

[ux1,fl1,rr1,it1,rv1] = bicgstab(A,b,1e-10,1000,L,U);

And it converges in MATLAB, with flag fl1=0, relative residue rr1=8.2725e-11, and iteration it1=89.5. I'm wondering how can I figure out what's wrong.

Best,

Hui

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150218/1db3c046/attachment-0001.html>