[petsc-users] Tuning the parallel performance of a 3D FEM CFD code

Barry Smith bsmith at mcs.anl.gov
Fri May 20 09:06:56 CDT 2011


   Be sure you are using the same definition of interlaced as we are. Generally when interlacing inodes will exist, when not interlacing they will not exist. From you print outs you have the opposite effect which is unlikely.

   Say we have a PDE with three components at each grid point,   u, v, and p. Then interlaced is storing the values   [u0 v0 p0 u1 v1 p1 ....] noninterlaced is storing it [u0 u1 ...   v0 v1 .... p0 p1 ....]


   Barry


On May 20, 2011, at 1:31 AM, Henning Sauerland wrote:

> 
> Am 18.05.2011 um 19:46 schrieb Barry Smith:
>> 
>> So interlacing the variables makes ILU() much worse in both iteration count and time?
> 
> 
> Yes. I guess the reason is that the inode routines are not used with the interlaced ordering:
> 
> interlaced:
> KSP Object:
> type: lgmres
>   GMRES: restart=200, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>   GMRES: happy breakdown tolerance 1e-30
>   LGMRES: aug. dimension=2
>   LGMRES: number of matvecs=592
> maximum iterations=10000, initial guess is zero
> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: asm
>   Additive Schwarz: total subdomain blocks = 16, amount of overlap = 1
>   Additive Schwarz: restriction/interpolation type - RESTRICT
>   Local solve is same for all blocks, in the following KSP and PC objects:
> KSP Object:(sub_)
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:(sub_)
>   type: ilu
>     ILU: out-of-place factorization
>     0 levels of fill
>     tolerance for zero pivot 1e-12
>     using diagonal shift to prevent zero pivot
>     matrix ordering: natural
>     factor fill ratio given 1, needed 1
>       Factored matrix follows:
>         Matrix Object:
>           type=seqaij, rows=144120, cols=144120
>           package used to perform factorization: petsc
>           total: nonzeros=14238265, allocated nonzeros=14238265
>             not using I-node routines
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=seqaij, rows=144120, cols=144120
>     total: nonzeros=14238265, allocated nonzeros=14238265
>       not using I-node routines
> linear system matrix = precond matrix:
> Matrix Object:
>   type=mpiaij, rows=548908, cols=548908
>   total: nonzeros=55971327, allocated nonzeros=93314360
>     not using I-node (on process 0) routines
> 
> 
> non-interlaced:
> KSP Object:
> type: lgmres
>   GMRES: restart=200, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>   GMRES: happy breakdown tolerance 1e-30
>   LGMRES: aug. dimension=2
>   LGMRES: number of matvecs=334
> maximum iterations=10000, initial guess is zero
> tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
> left preconditioning
> diagonally scaled system
> using PRECONDITIONED norm type for convergence test
> PC Object:
> type: asm
>   Additive Schwarz: total subdomain blocks = 16, amount of overlap = 1
>   Additive Schwarz: restriction/interpolation type - RESTRICT
>   Local solve is same for all blocks, in the following KSP and PC objects:
> KSP Object:(sub_)
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:(sub_)
>   type: ilu
>     ILU: out-of-place factorization
>     0 levels of fill
>     tolerance for zero pivot 1e-12
>     using diagonal shift to prevent zero pivot
>     matrix ordering: natural
>     factor fill ratio given 1, needed 1
>       Factored matrix follows:
>         Matrix Object:
>           type=seqaij, rows=41200, cols=41200
>           package used to perform factorization: petsc
>           total: nonzeros=3651205, allocated nonzeros=3651205
>             using I-node routines: found 15123 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Matrix Object:
>     type=seqaij, rows=41200, cols=41200
>     total: nonzeros=3651205, allocated nonzeros=3651205
>       using I-node routines: found 15123 nodes, limit used is 5
> linear system matrix = precond matrix:
> Matrix Object:
>   type=mpiaij, rows=548908, cols=548908
>   total: nonzeros=55971327, allocated nonzeros=112526140
>     using I-node (on process 0) routines: found 13156 nodes, limit used is 5
> 
> 



More information about the petsc-users mailing list