[petsc-users] Tuning the parallel performance of a 3D FEM CFD code
Henning Sauerland
uerland at gmail.com
Fri May 20 01:31:06 CDT 2011
Am 18.05.2011 um 19:46 schrieb Barry Smith:
>
> So interlacing the variables makes ILU() much worse in both iteration count and time?
Yes. I guess the reason is that the inode routines are not used with the interlaced ordering:
interlaced:
KSP Object:
type: lgmres
GMRES: restart=200, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
LGMRES: aug. dimension=2
LGMRES: number of matvecs=592
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:
type: asm
Additive Schwarz: total subdomain blocks = 16, amount of overlap = 1
Additive Schwarz: restriction/interpolation type - RESTRICT
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object:(sub_)
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:(sub_)
type: ilu
ILU: out-of-place factorization
0 levels of fill
tolerance for zero pivot 1e-12
using diagonal shift to prevent zero pivot
matrix ordering: natural
factor fill ratio given 1, needed 1
Factored matrix follows:
Matrix Object:
type=seqaij, rows=144120, cols=144120
package used to perform factorization: petsc
total: nonzeros=14238265, allocated nonzeros=14238265
not using I-node routines
linear system matrix = precond matrix:
Matrix Object:
type=seqaij, rows=144120, cols=144120
total: nonzeros=14238265, allocated nonzeros=14238265
not using I-node routines
linear system matrix = precond matrix:
Matrix Object:
type=mpiaij, rows=548908, cols=548908
total: nonzeros=55971327, allocated nonzeros=93314360
not using I-node (on process 0) routines
non-interlaced:
KSP Object:
type: lgmres
GMRES: restart=200, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
LGMRES: aug. dimension=2
LGMRES: number of matvecs=334
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000
left preconditioning
diagonally scaled system
using PRECONDITIONED norm type for convergence test
PC Object:
type: asm
Additive Schwarz: total subdomain blocks = 16, amount of overlap = 1
Additive Schwarz: restriction/interpolation type - RESTRICT
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object:(sub_)
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:(sub_)
type: ilu
ILU: out-of-place factorization
0 levels of fill
tolerance for zero pivot 1e-12
using diagonal shift to prevent zero pivot
matrix ordering: natural
factor fill ratio given 1, needed 1
Factored matrix follows:
Matrix Object:
type=seqaij, rows=41200, cols=41200
package used to perform factorization: petsc
total: nonzeros=3651205, allocated nonzeros=3651205
using I-node routines: found 15123 nodes, limit used is 5
linear system matrix = precond matrix:
Matrix Object:
type=seqaij, rows=41200, cols=41200
total: nonzeros=3651205, allocated nonzeros=3651205
using I-node routines: found 15123 nodes, limit used is 5
linear system matrix = precond matrix:
Matrix Object:
type=mpiaij, rows=548908, cols=548908
total: nonzeros=55971327, allocated nonzeros=112526140
using I-node (on process 0) routines: found 13156 nodes, limit used is 5
More information about the petsc-users
mailing list