[petsc-users] Fwd: Fieldsplit with sub pc MUMPS in parallel
Barry Smith
bsmith at mcs.anl.gov
Wed Jan 4 18:33:18 CST 2017
Dave,
When I run your example with what I think are the same options I do get the same convergence independent of processes (with exact solvers)
./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
Residual norms for stokes_ solve.
0 KSP Residual norm 2.219697707545e-01
1 KSP Residual norm 2.160497527164e-01
2 KSP Residual norm 2.931344898250e-02
3 KSP Residual norm 3.504825774118e-03
4 KSP Residual norm 5.626318301896e-04
5 KSP Residual norm 9.099204283519e-05
6 KSP Residual norm 1.731194762517e-05
7 KSP Residual norm 2.920732887195e-06
8 KSP Residual norm 4.154207455723e-07
~/Src/petsc/src/ksp/ksp/examples/tutorials (master=) arch-mpich
$ petscmpiexec -n 2 ./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
Residual norms for stokes_ solve.
0 KSP Residual norm 2.219697707545e-01
1 KSP Residual norm 2.160497527164e-01
2 KSP Residual norm 2.931344898250e-02
3 KSP Residual norm 3.504825774118e-03
4 KSP Residual norm 5.626318301897e-04
5 KSP Residual norm 9.099204283514e-05
6 KSP Residual norm 1.731194762514e-05
7 KSP Residual norm 2.920732887196e-06
8 KSP Residual norm 4.154207455766e-07
~/Src/petsc/src/ksp/ksp/examples/tutorials (master=) arch-mpich
$ petscmpiexec -n 3 ./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
Residual norms for stokes_ solve.
0 KSP Residual norm 2.219697707545e-01
1 KSP Residual norm 2.160497527164e-01
2 KSP Residual norm 2.931344898250e-02
3 KSP Residual norm 3.504825774117e-03
4 KSP Residual norm 5.626318301897e-04
5 KSP Residual norm 9.099204283517e-05
6 KSP Residual norm 1.731194762515e-05
7 KSP Residual norm 2.920732887202e-06
8 KSP Residual norm 4.154207455736e-07
~/Src/petsc/src/ksp/ksp/examples/tutorials (master=) arch-mpich
$ petscmpiexec -n 4 ./ex42 -stokes_pc_type fieldsplit -stokes_ksp_monitor -stokes_fieldsplit_u_pc_type lu -stokes_fieldsplit_p_pc_type lu -stokes_pc_fieldsplit_type schur -stokes_pc_fieldsplit_schur_factorization_type full -stokes_pc_fieldsplit_schur_precondition selfp
Residual norms for stokes_ solve.
0 KSP Residual norm 2.219697707545e-01
1 KSP Residual norm 2.160497527164e-01
2 KSP Residual norm 2.931344898250e-02
3 KSP Residual norm 3.504825774118e-03
4 KSP Residual norm 5.626318301897e-04
5 KSP Residual norm 9.099204283517e-05
6 KSP Residual norm 1.731194762513e-05
7 KSP Residual norm 2.920732887190e-06
8 KSP Residual norm 4.154207455781e-07
KSP Object: (stokes_) 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (stokes_) 4 MPI processes
type: fieldsplit
FieldSplit with Schur preconditioner, blocksize = 4, factorization FULL
Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse
Split info:
Split number 0 Fields 0, 1, 2
Split number 1 Fields 3
KSP solver for A00 block
KSP Object: (stokes_fieldsplit_u_) 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (stokes_fieldsplit_u_) 4 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 4 MPI processes
type: superlu_dist
rows=3993, cols=3993
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 2 x npcol 2
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots FALSE
Use iterative refinement FALSE
Processors in row 2 col partition 2
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern
linear system matrix = precond matrix:
Mat Object: (stokes_fieldsplit_u_) 4 MPI processes
type: mpiaij
rows=3993, cols=3993, bs=3
total: nonzeros=268119, allocated nonzeros=268119
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 396 nodes, limit used is 5
KSP solver for S = A11 - A10 inv(A00) A01
KSP Object: (stokes_fieldsplit_p_) 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (stokes_fieldsplit_p_) 4 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 4 MPI processes
type: superlu_dist
rows=1331, cols=1331
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 2 x npcol 2
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots FALSE
Use iterative refinement FALSE
Processors in row 2 col partition 2
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern
linear system matrix followed by preconditioner matrix:
Mat Object: (stokes_fieldsplit_p_) 4 MPI processes
type: schurcomplement
rows=1331, cols=1331
Schur complement A11 - A10 inv(A00) A01
A11
Mat Object: (stokes_fieldsplit_p_) 4 MPI processes
type: mpiaij
rows=1331, cols=1331
total: nonzeros=29791, allocated nonzeros=29791
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
A10
Mat Object: 4 MPI processes
type: mpiaij
rows=1331, cols=3993
total: nonzeros=89373, allocated nonzeros=89373
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
KSP of A00
KSP Object: (stokes_fieldsplit_u_) 4 MPI processes
type: gmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (stokes_fieldsplit_u_) 4 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 4 MPI processes
type: superlu_dist
rows=3993, cols=3993
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
total number of mallocs used during MatSetValues calls =0
SuperLU_DIST run parameters:
Process grid nprow 2 x npcol 2
Equilibrate matrix TRUE
Matrix input mode 1
Replace tiny pivots FALSE
Use iterative refinement FALSE
Processors in row 2 col partition 2
Row permutation LargeDiag
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern
linear system matrix = precond matrix:
Mat Object: (stokes_fieldsplit_u_) 4 MPI processes
type: mpiaij
rows=3993, cols=3993, bs=3
total: nonzeros=268119, allocated nonzeros=268119
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 396 nodes, limit used is 5
A01
Mat Object: 4 MPI processes
type: mpiaij
rows=3993, cols=1331, rbs=3, cbs = 1
total: nonzeros=89373, allocated nonzeros=89373
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 396 nodes, limit used is 5
Mat Object: 4 MPI processes
type: mpiaij
rows=1331, cols=1331
total: nonzeros=117649, allocated nonzeros=117649
total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines
linear system matrix followed by preconditioner matrix:
Mat Object: 4 MPI processes
type: mpiaij
rows=5324, cols=5324, bs=4
total: nonzeros=476656, allocated nonzeros=476656
total number of mallocs used during MatSetValues calls =0
Mat Object: 4 MPI processes
type: mpiaij
rows=5324, cols=5324, bs=4
total: nonzeros=476656, allocated nonzeros=476656
total number of mallocs used during MatSetValues calls =0
> On Jan 4, 2017, at 4:06 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
> The issue is your fieldsplit_1 solve. You are applying mumps to an approximate Schur complement - not the true Schur complement. Seemingly the approximation is dependent on the communicator size.
>
> If you want to see iteration counts of 2, independent of mesh size and communicator size you need to solve the true Schur complement system (fieldsplit_1) to a specified tolerance (Erik 1e-10) - don't use preonly.
>
> In practice you probably don't want to iterate on the Schur complement either as it is likely too expensive. If you provided fieldsplit with a spectrally equivalent approximation to S, iteration counts would be larger than two, but they would be independent of the number of elements and comm size
>
> Thanks,
> Dave
>
>
>
>
> On Wed, 4 Jan 2017 at 22:39, Karin&NiKo <niko.karin at gmail.com> wrote:
> Dear Petsc team,
>
> I am (still) trying to solve Biot's poroelasticity problem :
> <image.png>
>
> I am using a mixed P2-P1 finite element discretization. The matrix of the discretized system in binary format is attached to this email.
>
> I am using the fieldsplit framework to solve the linear system. Since I am facing some troubles, I have decided to go back to simple things. Here are the options I am using :
>
> -ksp_rtol 1.0e-5
> -ksp_type fgmres
> -pc_type fieldsplit
> -pc_fieldsplit_schur_factorization_type full
> -pc_fieldsplit_type schur
> -pc_fieldsplit_schur_precondition selfp
> -fieldsplit_0_pc_type lu
> -fieldsplit_0_pc_factor_mat_solver_package mumps
> -fieldsplit_0_ksp_type preonly
> -fieldsplit_0_ksp_converged_reason
> -fieldsplit_1_pc_type lu
> -fieldsplit_1_pc_factor_mat_solver_package mumps
> -fieldsplit_1_ksp_type preonly
> -fieldsplit_1_ksp_converged_reason
>
> On a single proc, everything runs fine : the solver converges in 3 iterations, according to the theory (see Run-1-proc.txt [contains -log_view]).
>
> On 2 procs, the solver converges in 28 iterations (see Run-2-proc.txt).
>
> On 3 procs, the solver converges in 91 iterations (see Run-3-proc.txt).
>
> I do not understand this behavior : since MUMPS is a parallel direct solver, shouldn't the solver converge in max 3 iterations whatever the number of procs?
>
>
> Thanks for your precious help,
> Nicolas
>
>
>
>
>
More information about the petsc-users
mailing list