[petsc-users] KSPSetUp does not scale
Thomas Witkowski
thomas.witkowski at tu-dresden.de
Mon Nov 19 06:36:41 CST 2012
I can do this! Should I stop the run after KSPSetUp? Or do you want to
see the log_summary file from the whole run?
Thomas
Am 19.11.2012 13:33, schrieb Jed Brown:
> Always, always, always send -log_summary when asking about performance.
>
>
> On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski
> <thomas.witkowski at tu-dresden.de
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
> I have some scaling problem in KSPSetUp, maybe some of you can
> help me to fix it. It takes 4.5 seconds on 64 cores, and 4.0 cores
> on 128 cores. The matrix has around 11 million rows and is not
> perfectly balanced, but the number of maximum rows per core in the
> 128 cases is exactly halfe of the number in the case when using 64
> cores. Besides the scaling, why does the setup takes so long? I
> though that just some objects are created but no calculation is
> going on!
>
> The KSPView on the corresponding solver objects is as follows:
>
> KSP Object:(ns_) 64 MPI processes
> type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=100, initial guess is zero
> tolerances: relative=1e-06, absolute=1e-08, divergence=10000
> right preconditioning
> has attached null space
> using UNPRECONDITIONED norm type for convergence test
> PC Object:(ns_) 64 MPI processes
> type: fieldsplit
> FieldSplit with Schur preconditioner, factorization FULL
> Preconditioner for the Schur complement formed from the block
> diagonal part of A11
> Split info:
> Split number 0 Defined by IS
> Split number 1 Defined by IS
> KSP solver for A00 block
> KSP Object: (ns_fieldsplit_velocity_) 64 MPI
> processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: (ns_fieldsplit_velocity_) 64 MPI processes
> type: none
> linear system matrix = precond matrix:
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11068107, cols=11068107
> total: nonzeros=315206535, allocated nonzeros=315206535
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> KSP solver for S = A11 - A10 inv(A00) A01
> KSP Object: (ns_fieldsplit_pressure_) 64 MPI
> processes
> type: gmres
> GMRES: restart=30, using Classical (unmodified)
> Gram-Schmidt Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: (ns_fieldsplit_pressure_) 64 MPI processes
> type: none
> linear system matrix followed by preconditioner matrix:
> Matrix Object: 64 MPI processes
> type: schurcomplement
> rows=469678, cols=469678
> Schur complement A11 - A10 inv(A00) A01
> A11
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=469678, cols=469678
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues
> calls =0
> using I-node (on process 0) routines: found 1304
> nodes, limit used is 5
> A10
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=469678, cols=11068107
> total: nonzeros=89122957, allocated nonzeros=89122957
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node (on process 0) routines
> KSP of A00
> KSP Object: (ns_fieldsplit_velocity_) 64 MPI
> processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50,
> divergence=10000
> left preconditioning
> using DEFAULT norm type for convergence test
> PC Object: (ns_fieldsplit_velocity_) 64 MPI
> processes
> type: none
> linear system matrix = precond matrix:
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11068107, cols=11068107
> total: nonzeros=315206535, allocated
> nonzeros=315206535
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node (on process 0) routines
> A01
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11068107, cols=469678
> total: nonzeros=88821041, allocated nonzeros=88821041
> total number of mallocs used during MatSetValues
> calls =0
> not using I-node (on process 0) routines
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=469678, cols=469678
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 1304
> nodes, limit used is 5
> linear system matrix = precond matrix:
> Matrix Object: 64 MPI processes
> type: mpiaij
> rows=11537785, cols=11537785
> total: nonzeros=493150533, allocated nonzeros=510309207
> total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
>
>
>
> Thomas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/c4fe2045/attachment-0001.html>
More information about the petsc-users
mailing list