[petsc-users] KSPSetUp does not scale

Thomas Witkowski thomas.witkowski at tu-dresden.de
Mon Nov 19 04:26:48 CST 2012


I have some scaling problem in KSPSetUp, maybe some of you can help me 
to fix it. It takes 4.5 seconds on 64 cores, and 4.0 cores on 128 cores. 
The matrix has around 11 million rows and is not perfectly balanced, but 
the number of maximum rows per core in the 128 cases is exactly halfe of 
the number in the case when using 64 cores. Besides the scaling, why 
does the setup takes so long? I though that just some objects are 
created but no calculation is going on!

The KSPView on the corresponding solver objects is as follows:

KSP Object:(ns_) 64 MPI processes
   type: fgmres
     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
     GMRES: happy breakdown tolerance 1e-30
   maximum iterations=100, initial guess is zero
   tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
   right preconditioning
   has attached null space
   using UNPRECONDITIONED norm type for convergence test
PC Object:(ns_) 64 MPI processes
   type: fieldsplit
     FieldSplit with Schur preconditioner, factorization FULL
     Preconditioner for the Schur complement formed from the block 
diagonal part of A11
     Split info:
     Split number 0 Defined by IS
     Split number 1 Defined by IS
     KSP solver for A00 block
       KSP Object:      (ns_fieldsplit_velocity_)       64 MPI processes
         type: preonly
         maximum iterations=10000, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
         left preconditioning
         using DEFAULT norm type for convergence test
       PC Object:      (ns_fieldsplit_velocity_)       64 MPI processes
         type: none
         linear system matrix = precond matrix:
         Matrix Object:         64 MPI processes
           type: mpiaij
           rows=11068107, cols=11068107
           total: nonzeros=315206535, allocated nonzeros=315206535
           total number of mallocs used during MatSetValues calls =0
             not using I-node (on process 0) routines
     KSP solver for S = A11 - A10 inv(A00) A01
       KSP Object:      (ns_fieldsplit_pressure_)       64 MPI processes
         type: gmres
           GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
           GMRES: happy breakdown tolerance 1e-30
         maximum iterations=10000, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
         left preconditioning
         using DEFAULT norm type for convergence test
       PC Object:      (ns_fieldsplit_pressure_)       64 MPI processes
         type: none
         linear system matrix followed by preconditioner matrix:
         Matrix Object:         64 MPI processes
           type: schurcomplement
           rows=469678, cols=469678
             Schur complement A11 - A10 inv(A00) A01
             A11
               Matrix Object:               64 MPI processes
                 type: mpiaij
                 rows=469678, cols=469678
                 total: nonzeros=0, allocated nonzeros=0
                 total number of mallocs used during MatSetValues calls =0
                   using I-node (on process 0) routines: found 1304 
nodes, limit used is 5
             A10
               Matrix Object:               64 MPI processes
                 type: mpiaij
                 rows=469678, cols=11068107
                 total: nonzeros=89122957, allocated nonzeros=89122957
                 total number of mallocs used during MatSetValues calls =0
                   not using I-node (on process 0) routines
             KSP of A00
               KSP Object: (ns_fieldsplit_velocity_)               64 
MPI processes
                 type: preonly
                 maximum iterations=10000, initial guess is zero
                 tolerances:  relative=1e-05, absolute=1e-50, 
divergence=10000
                 left preconditioning
                 using DEFAULT norm type for convergence test
               PC Object: (ns_fieldsplit_velocity_)               64 MPI 
processes
                 type: none
                 linear system matrix = precond matrix:
                 Matrix Object:                 64 MPI processes
                   type: mpiaij
                   rows=11068107, cols=11068107
                   total: nonzeros=315206535, allocated nonzeros=315206535
                   total number of mallocs used during MatSetValues calls =0
                     not using I-node (on process 0) routines
             A01
               Matrix Object:               64 MPI processes
                 type: mpiaij
                 rows=11068107, cols=469678
                 total: nonzeros=88821041, allocated nonzeros=88821041
                 total number of mallocs used during MatSetValues calls =0
                   not using I-node (on process 0) routines
         Matrix Object:         64 MPI processes
           type: mpiaij
           rows=469678, cols=469678
           total: nonzeros=0, allocated nonzeros=0
           total number of mallocs used during MatSetValues calls =0
             using I-node (on process 0) routines: found 1304 nodes, 
limit used is 5
   linear system matrix = precond matrix:
   Matrix Object:   64 MPI processes
     type: mpiaij
     rows=11537785, cols=11537785
     total: nonzeros=493150533, allocated nonzeros=510309207
     total number of mallocs used during MatSetValues calls =0
       not using I-node (on process 0) routines




Thomas


More information about the petsc-users mailing list