[petsc-users] Tuning the parallel performance of a 3D FEM CFD code

Henning Sauerland uerland at gmail.com
Fri May 20 01:31:06 CDT 2011


Am 18.05.2011 um 19:46 schrieb Barry Smith:
> 
>  So interlacing the variables makes ILU() much worse in both iteration count and time?


Yes. I guess the reason is that the inode routines are not used with the interlaced ordering:

interlaced:
KSP Object:
 type: lgmres
   GMRES: restart=200, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
   GMRES: happy breakdown tolerance 1e-30
   LGMRES: aug. dimension=2
   LGMRES: number of matvecs=592
 maximum iterations=10000, initial guess is zero
 tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
 left preconditioning
 using PRECONDITIONED norm type for convergence test
PC Object:
 type: asm
   Additive Schwarz: total subdomain blocks = 16, amount of overlap = 1
   Additive Schwarz: restriction/interpolation type - RESTRICT
   Local solve is same for all blocks, in the following KSP and PC objects:
 KSP Object:(sub_)
   type: preonly
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
   left preconditioning
   using PRECONDITIONED norm type for convergence test
 PC Object:(sub_)
   type: ilu
     ILU: out-of-place factorization
     0 levels of fill
     tolerance for zero pivot 1e-12
     using diagonal shift to prevent zero pivot
     matrix ordering: natural
     factor fill ratio given 1, needed 1
       Factored matrix follows:
         Matrix Object:
           type=seqaij, rows=144120, cols=144120
           package used to perform factorization: petsc
           total: nonzeros=14238265, allocated nonzeros=14238265
             not using I-node routines
   linear system matrix = precond matrix:
   Matrix Object:
     type=seqaij, rows=144120, cols=144120
     total: nonzeros=14238265, allocated nonzeros=14238265
       not using I-node routines
 linear system matrix = precond matrix:
 Matrix Object:
   type=mpiaij, rows=548908, cols=548908
   total: nonzeros=55971327, allocated nonzeros=93314360
     not using I-node (on process 0) routines


non-interlaced:
KSP Object:
 type: lgmres
   GMRES: restart=200, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
   GMRES: happy breakdown tolerance 1e-30
   LGMRES: aug. dimension=2
   LGMRES: number of matvecs=334
 maximum iterations=10000, initial guess is zero
 tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
 left preconditioning
 diagonally scaled system
 using PRECONDITIONED norm type for convergence test
PC Object:
 type: asm
   Additive Schwarz: total subdomain blocks = 16, amount of overlap = 1
   Additive Schwarz: restriction/interpolation type - RESTRICT
   Local solve is same for all blocks, in the following KSP and PC objects:
 KSP Object:(sub_)
   type: preonly
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
   left preconditioning
   using PRECONDITIONED norm type for convergence test
 PC Object:(sub_)
   type: ilu
     ILU: out-of-place factorization
     0 levels of fill
     tolerance for zero pivot 1e-12
     using diagonal shift to prevent zero pivot
     matrix ordering: natural
     factor fill ratio given 1, needed 1
       Factored matrix follows:
         Matrix Object:
           type=seqaij, rows=41200, cols=41200
           package used to perform factorization: petsc
           total: nonzeros=3651205, allocated nonzeros=3651205
             using I-node routines: found 15123 nodes, limit used is 5
   linear system matrix = precond matrix:
   Matrix Object:
     type=seqaij, rows=41200, cols=41200
     total: nonzeros=3651205, allocated nonzeros=3651205
       using I-node routines: found 15123 nodes, limit used is 5
 linear system matrix = precond matrix:
 Matrix Object:
   type=mpiaij, rows=548908, cols=548908
   total: nonzeros=55971327, allocated nonzeros=112526140
     using I-node (on process 0) routines: found 13156 nodes, limit used is 5




More information about the petsc-users mailing list