[petsc-users] KSPSetUp does not scale
Jed Brown
jedbrown at mcs.anl.gov
Mon Nov 19 06:33:27 CST 2012
Always, always, always send -log_summary when asking about performance.
On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:
> I have some scaling problem in KSPSetUp, maybe some of you can help me to
> fix it. It takes 4.5 seconds on 64 cores, and 4.0 cores on 128 cores. The
> matrix has around 11 million rows and is not perfectly balanced, but the
> number of maximum rows per core in the 128 cases is exactly halfe of the
> number in the case when using 64 cores. Besides the scaling, why does the
> setup takes so long? I though that just some objects are created but no
> calculation is going on!
>
> The KSPView on the corresponding solver objects is as follows:
>
> KSP Object:(ns_) 64 MPI processes
> type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=100, initial guess is zero
> tolerances: relative=1e-06, absolute=1e-08, divergence=10000
> right preconditioning
> has attached null space
> using UNPRECONDITIONED norm type for convergence test
> PC Object:(ns_) 64 MPI processes
> type: fieldsplit
> FieldSplit with Schur preconditioner, factorization FULL
> Preconditioner for the Schur complement formed from the block diagonal
> part of A11
> Split info:
> Split number 0 Defined by IS
> Split number 1 Defined by IS
> KSP solver for A00 block
> KSP Object: (ns_fieldsplit_velocity_) 64 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: (ns_fieldsplit_velocity_) 64 MPI processes
> type: none
> linear system matrix = precond matrix:
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11068107, cols=11068107
> total: nonzeros=315206535, allocated nonzeros=315206535
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> KSP solver for S = A11 - A10 inv(A00) A01
> KSP Object: (ns_fieldsplit_pressure_) 64 MPI processes
> type: gmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: (ns_fieldsplit_pressure_) 64 MPI processes
> type: none
> linear system matrix followed by preconditioner matrix:
> Matrix Object: 64 MPI processes
> type: schurcomplement
> rows=469678, cols=469678
> Schur complement A11 - A10 inv(A00) A01
> A11
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=469678, cols=469678
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 1304 nodes,
> limit used is 5
> A10
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=469678, cols=11068107
> total: nonzeros=89122957, allocated nonzeros=89122957
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> KSP of A00
> KSP Object: (ns_fieldsplit_velocity_) 64 MPI
> processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: (ns_fieldsplit_velocity_) 64 MPI
> processes
> type: none
> linear system matrix = precond matrix:
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11068107, cols=11068107
> total: nonzeros=315206535, allocated nonzeros=315206535
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> A01
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11068107, cols=469678
> total: nonzeros=88821041, allocated nonzeros=88821041
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=469678, cols=469678
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 1304 nodes, limit
> used is 5
> linear system matrix = precond matrix:
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11537785, cols=11537785
> total: nonzeros=493150533, allocated nonzeros=510309207
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
>
>
>
> Thomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/9da0659b/attachment.html>
More information about the petsc-users
mailing list