[petsc-users] Fwd: Fieldsplit with sub pc MUMPS in parallel

Barry Smith bsmith at mcs.anl.gov
Wed Jan 4 18:33:18 CST 2017


   Dave,

    When I run your example with what I think are the same options I do get the same convergence independent of processes (with exact solvers) 

./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
  Residual norms for stokes_ solve.
  0 KSP Residual norm 2.219697707545e-01 
  1 KSP Residual norm 2.160497527164e-01 
  2 KSP Residual norm 2.931344898250e-02 
  3 KSP Residual norm 3.504825774118e-03 
  4 KSP Residual norm 5.626318301896e-04 
  5 KSP Residual norm 9.099204283519e-05 
  6 KSP Residual norm 1.731194762517e-05 
  7 KSP Residual norm 2.920732887195e-06 
  8 KSP Residual norm 4.154207455723e-07 
~/Src/petsc/src/ksp/ksp/examples/tutorials (master=) arch-mpich
$ petscmpiexec -n 2 ./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
  Residual norms for stokes_ solve.
  0 KSP Residual norm 2.219697707545e-01 
  1 KSP Residual norm 2.160497527164e-01 
  2 KSP Residual norm 2.931344898250e-02 
  3 KSP Residual norm 3.504825774118e-03 
  4 KSP Residual norm 5.626318301897e-04 
  5 KSP Residual norm 9.099204283514e-05 
  6 KSP Residual norm 1.731194762514e-05 
  7 KSP Residual norm 2.920732887196e-06 
  8 KSP Residual norm 4.154207455766e-07 
~/Src/petsc/src/ksp/ksp/examples/tutorials (master=) arch-mpich
$ petscmpiexec -n 3 ./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
  Residual norms for stokes_ solve.
  0 KSP Residual norm 2.219697707545e-01 
  1 KSP Residual norm 2.160497527164e-01 
  2 KSP Residual norm 2.931344898250e-02 
  3 KSP Residual norm 3.504825774117e-03 
  4 KSP Residual norm 5.626318301897e-04 
  5 KSP Residual norm 9.099204283517e-05 
  6 KSP Residual norm 1.731194762515e-05 
  7 KSP Residual norm 2.920732887202e-06 
  8 KSP Residual norm 4.154207455736e-07 
~/Src/petsc/src/ksp/ksp/examples/tutorials (master=) arch-mpich
$ petscmpiexec -n 4 ./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
  Residual norms for stokes_ solve.
  0 KSP Residual norm 2.219697707545e-01 
  1 KSP Residual norm 2.160497527164e-01 
  2 KSP Residual norm 2.931344898250e-02 
  3 KSP Residual norm 3.504825774118e-03 
  4 KSP Residual norm 5.626318301897e-04 
  5 KSP Residual norm 9.099204283517e-05 
  6 KSP Residual norm 1.731194762513e-05 
  7 KSP Residual norm 2.920732887190e-06 
  8 KSP Residual norm 4.154207455781e-07 

KSP Object: (stokes_) 4 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: (stokes_) 4 MPI processes
  type: fieldsplit
    FieldSplit with Schur preconditioner, blocksize = 4, factorization FULL
    Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse
    Split info:
    Split number 0 Fields  0, 1, 2
    Split number 1 Fields  3
    KSP solver for A00 block
      KSP Object: (stokes_fieldsplit_u_) 4 MPI processes
        type: gmres
          GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
          GMRES: happy breakdown tolerance 1e-30
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object: (stokes_fieldsplit_u_) 4 MPI processes
        type: lu
          out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          matrix ordering: natural
          factor fill ratio given 0., needed 0.
            Factored matrix follows:
              Mat Object: 4 MPI processes
                type: superlu_dist
                rows=3993, cols=3993
                package used to perform factorization: superlu_dist
                total: nonzeros=0, allocated nonzeros=0
                total number of mallocs used during MatSetValues calls =0
                  SuperLU_DIST run parameters:
                    Process grid nprow 2 x npcol 2 
                    Equilibrate matrix TRUE 
                    Matrix input mode 1 
                    Replace tiny pivots FALSE 
                    Use iterative refinement FALSE 
                    Processors in row 2 col partition 2 
                    Row permutation LargeDiag 
                    Column permutation METIS_AT_PLUS_A
                    Parallel symbolic factorization FALSE 
                    Repeated factorization SamePattern
        linear system matrix = precond matrix:
        Mat Object: (stokes_fieldsplit_u_) 4 MPI processes
          type: mpiaij
          rows=3993, cols=3993, bs=3
          total: nonzeros=268119, allocated nonzeros=268119
          total number of mallocs used during MatSetValues calls =0
            using I-node (on process 0) routines: found 396 nodes, limit used is 5
    KSP solver for S = A11 - A10 inv(A00) A01 
      KSP Object: (stokes_fieldsplit_p_) 4 MPI processes
        type: gmres
          GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
          GMRES: happy breakdown tolerance 1e-30
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object: (stokes_fieldsplit_p_) 4 MPI processes
        type: lu
          out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          matrix ordering: natural
          factor fill ratio given 0., needed 0.
            Factored matrix follows:
              Mat Object: 4 MPI processes
                type: superlu_dist
                rows=1331, cols=1331
                package used to perform factorization: superlu_dist
                total: nonzeros=0, allocated nonzeros=0
                total number of mallocs used during MatSetValues calls =0
                  SuperLU_DIST run parameters:
                    Process grid nprow 2 x npcol 2 
                    Equilibrate matrix TRUE 
                    Matrix input mode 1 
                    Replace tiny pivots FALSE 
                    Use iterative refinement FALSE 
                    Processors in row 2 col partition 2 
                    Row permutation LargeDiag 
                    Column permutation METIS_AT_PLUS_A
                    Parallel symbolic factorization FALSE 
                    Repeated factorization SamePattern
        linear system matrix followed by preconditioner matrix:
        Mat Object: (stokes_fieldsplit_p_) 4 MPI processes
          type: schurcomplement
          rows=1331, cols=1331
            Schur complement A11 - A10 inv(A00) A01
            A11
              Mat Object: (stokes_fieldsplit_p_) 4 MPI processes
                type: mpiaij
                rows=1331, cols=1331
                total: nonzeros=29791, allocated nonzeros=29791
                total number of mallocs used during MatSetValues calls =0
                  not using I-node (on process 0) routines
            A10
              Mat Object: 4 MPI processes
                type: mpiaij
                rows=1331, cols=3993
                total: nonzeros=89373, allocated nonzeros=89373
                total number of mallocs used during MatSetValues calls =0
                  not using I-node (on process 0) routines
            KSP of A00
              KSP Object: (stokes_fieldsplit_u_) 4 MPI processes
                type: gmres
                  GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  GMRES: happy breakdown tolerance 1e-30
                maximum iterations=10000, initial guess is zero
                tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
                left preconditioning
                using PRECONDITIONED norm type for convergence test
              PC Object: (stokes_fieldsplit_u_) 4 MPI processes
                type: lu
                  out-of-place factorization
                  tolerance for zero pivot 2.22045e-14
                  matrix ordering: natural
                  factor fill ratio given 0., needed 0.
                    Factored matrix follows:
                      Mat Object: 4 MPI processes
                        type: superlu_dist
                        rows=3993, cols=3993
                        package used to perform factorization: superlu_dist
                        total: nonzeros=0, allocated nonzeros=0
                        total number of mallocs used during MatSetValues calls =0
                          SuperLU_DIST run parameters:
                            Process grid nprow 2 x npcol 2 
                            Equilibrate matrix TRUE 
                            Matrix input mode 1 
                            Replace tiny pivots FALSE 
                            Use iterative refinement FALSE 
                            Processors in row 2 col partition 2 
                            Row permutation LargeDiag 
                            Column permutation METIS_AT_PLUS_A
                            Parallel symbolic factorization FALSE 
                            Repeated factorization SamePattern
                linear system matrix = precond matrix:
                Mat Object: (stokes_fieldsplit_u_) 4 MPI processes
                  type: mpiaij
                  rows=3993, cols=3993, bs=3
                  total: nonzeros=268119, allocated nonzeros=268119
                  total number of mallocs used during MatSetValues calls =0
                    using I-node (on process 0) routines: found 396 nodes, limit used is 5
            A01
              Mat Object: 4 MPI processes
                type: mpiaij
                rows=3993, cols=1331, rbs=3, cbs = 1
                total: nonzeros=89373, allocated nonzeros=89373
                total number of mallocs used during MatSetValues calls =0
                  using I-node (on process 0) routines: found 396 nodes, limit used is 5
        Mat Object: 4 MPI processes
          type: mpiaij
          rows=1331, cols=1331
          total: nonzeros=117649, allocated nonzeros=117649
          total number of mallocs used during MatSetValues calls =0
            not using I-node (on process 0) routines
  linear system matrix followed by preconditioner matrix:
  Mat Object: 4 MPI processes
    type: mpiaij
    rows=5324, cols=5324, bs=4
    total: nonzeros=476656, allocated nonzeros=476656
    total number of mallocs used during MatSetValues calls =0
  Mat Object: 4 MPI processes
    type: mpiaij
    rows=5324, cols=5324, bs=4
    total: nonzeros=476656, allocated nonzeros=476656
    total number of mallocs used during MatSetValues calls =0

> On Jan 4, 2017, at 4:06 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> 
> The issue is your fieldsplit_1 solve. You are applying mumps to an approximate Schur complement - not the true Schur complement. Seemingly the approximation is dependent on the communicator size.
> 
> If you want to see iteration counts of 2, independent of mesh size and communicator size you need to solve the true Schur complement system (fieldsplit_1) to a specified tolerance (Erik 1e-10) - don't use preonly.
> 
> In practice you probably don't want to iterate on the Schur complement either as it is likely too expensive. If you provided fieldsplit with a spectrally equivalent approximation to S, iteration counts would be larger than two, but they would be independent of the number of elements and comm size
> 
> Thanks,
>   Dave
> 
> 
> 
> 
> On Wed, 4 Jan 2017 at 22:39, Karin&NiKo <niko.karin at gmail.com> wrote:
> Dear Petsc team,
> 
> I am (still) trying to solve Biot's poroelasticity problem :
>  <image.png>
> 
> I am using a mixed P2-P1 finite element discretization. The matrix of the discretized system in binary format is attached to this email.
> 
> I am using the fieldsplit framework to solve the linear system. Since I am facing some troubles, I have decided to go back to simple things. Here are the options I am using :
> 
> -ksp_rtol 1.0e-5
> -ksp_type fgmres
> -pc_type fieldsplit
> -pc_fieldsplit_schur_factorization_type full
> -pc_fieldsplit_type schur
> -pc_fieldsplit_schur_precondition selfp
> -fieldsplit_0_pc_type lu
> -fieldsplit_0_pc_factor_mat_solver_package mumps
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_ksp_converged_reason
> -fieldsplit_1_pc_type lu
> -fieldsplit_1_pc_factor_mat_solver_package mumps
> -fieldsplit_1_ksp_type preonly
> -fieldsplit_1_ksp_converged_reason
> 
> On a single proc, everything runs fine : the solver converges in 3 iterations, according to the theory (see Run-1-proc.txt [contains -log_view]).
> 
> On 2 procs, the solver converges in 28 iterations (see Run-2-proc.txt).
> 
> On 3 procs, the solver converges in 91 iterations (see Run-3-proc.txt).
> 
> I do not understand this behavior : since MUMPS is a parallel direct solver, shouldn't the solver converge in max 3 iterations whatever the number of procs?
> 
> 
> Thanks for your precious help,
> Nicolas
> 
> 
> 
> 
> 



More information about the petsc-users mailing list