[petsc-users] KSPSetUp does not scale

Thomas Witkowski thomas.witkowski at tu-dresden.de
Mon Nov 19 06:36:41 CST 2012


I can do this! Should I stop the run after KSPSetUp? Or do you want to 
see the log_summary file from the whole run?

Thomas

Am 19.11.2012 13:33, schrieb Jed Brown:
> Always, always, always send -log_summary when asking about performance.
>
>
> On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski 
> <thomas.witkowski at tu-dresden.de 
> <mailto:thomas.witkowski at tu-dresden.de>> wrote:
>
>     I have some scaling problem in KSPSetUp, maybe some of you can
>     help me to fix it. It takes 4.5 seconds on 64 cores, and 4.0 cores
>     on 128 cores. The matrix has around 11 million rows and is not
>     perfectly balanced, but the number of maximum rows per core in the
>     128 cases is exactly halfe of the number in the case when using 64
>     cores. Besides the scaling, why does the setup takes so long? I
>     though that just some objects are created but no calculation is
>     going on!
>
>     The KSPView on the corresponding solver objects is as follows:
>
>     KSP Object:(ns_) 64 MPI processes
>       type: fgmres
>         GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>     Orthogonalization with no iterative refinement
>         GMRES: happy breakdown tolerance 1e-30
>       maximum iterations=100, initial guess is zero
>       tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>       right preconditioning
>       has attached null space
>       using UNPRECONDITIONED norm type for convergence test
>     PC Object:(ns_) 64 MPI processes
>       type: fieldsplit
>         FieldSplit with Schur preconditioner, factorization FULL
>         Preconditioner for the Schur complement formed from the block
>     diagonal part of A11
>         Split info:
>         Split number 0 Defined by IS
>         Split number 1 Defined by IS
>         KSP solver for A00 block
>           KSP Object:      (ns_fieldsplit_velocity_)       64 MPI
>     processes
>             type: preonly
>             maximum iterations=10000, initial guess is zero
>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>             left preconditioning
>             using DEFAULT norm type for convergence test
>           PC Object:      (ns_fieldsplit_velocity_)       64 MPI processes
>             type: none
>             linear system matrix = precond matrix:
>             Matrix Object:         64 MPI processes
>               type: mpiaij
>               rows=11068107, cols=11068107
>               total: nonzeros=315206535, allocated nonzeros=315206535
>               total number of mallocs used during MatSetValues calls =0
>                 not using I-node (on process 0) routines
>         KSP solver for S = A11 - A10 inv(A00) A01
>           KSP Object:      (ns_fieldsplit_pressure_)       64 MPI
>     processes
>             type: gmres
>               GMRES: restart=30, using Classical (unmodified)
>     Gram-Schmidt Orthogonalization with no iterative refinement
>               GMRES: happy breakdown tolerance 1e-30
>             maximum iterations=10000, initial guess is zero
>             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>             left preconditioning
>             using DEFAULT norm type for convergence test
>           PC Object:      (ns_fieldsplit_pressure_)       64 MPI processes
>             type: none
>             linear system matrix followed by preconditioner matrix:
>             Matrix Object:         64 MPI processes
>               type: schurcomplement
>               rows=469678, cols=469678
>                 Schur complement A11 - A10 inv(A00) A01
>                 A11
>                   Matrix Object:               64 MPI processes
>                     type: mpiaij
>                     rows=469678, cols=469678
>                     total: nonzeros=0, allocated nonzeros=0
>                     total number of mallocs used during MatSetValues
>     calls =0
>                       using I-node (on process 0) routines: found 1304
>     nodes, limit used is 5
>                 A10
>                   Matrix Object:               64 MPI processes
>                     type: mpiaij
>                     rows=469678, cols=11068107
>                     total: nonzeros=89122957, allocated nonzeros=89122957
>                     total number of mallocs used during MatSetValues
>     calls =0
>                       not using I-node (on process 0) routines
>                 KSP of A00
>                   KSP Object: (ns_fieldsplit_velocity_)       64 MPI
>     processes
>                     type: preonly
>                     maximum iterations=10000, initial guess is zero
>                     tolerances:  relative=1e-05, absolute=1e-50,
>     divergence=10000
>                     left preconditioning
>                     using DEFAULT norm type for convergence test
>                   PC Object: (ns_fieldsplit_velocity_)     64 MPI
>     processes
>                     type: none
>                     linear system matrix = precond matrix:
>                     Matrix Object:                 64 MPI processes
>                       type: mpiaij
>                       rows=11068107, cols=11068107
>                       total: nonzeros=315206535, allocated
>     nonzeros=315206535
>                       total number of mallocs used during MatSetValues
>     calls =0
>                         not using I-node (on process 0) routines
>                 A01
>                   Matrix Object:               64 MPI processes
>                     type: mpiaij
>                     rows=11068107, cols=469678
>                     total: nonzeros=88821041, allocated nonzeros=88821041
>                     total number of mallocs used during MatSetValues
>     calls =0
>                       not using I-node (on process 0) routines
>             Matrix Object:         64 MPI processes
>               type: mpiaij
>               rows=469678, cols=469678
>               total: nonzeros=0, allocated nonzeros=0
>               total number of mallocs used during MatSetValues calls =0
>                 using I-node (on process 0) routines: found 1304
>     nodes, limit used is 5
>       linear system matrix = precond matrix:
>       Matrix Object:   64 MPI processes
>         type: mpiaij
>         rows=11537785, cols=11537785
>         total: nonzeros=493150533, allocated nonzeros=510309207
>         total number of mallocs used during MatSetValues calls =0
>           not using I-node (on process 0) routines
>
>
>
>
>     Thomas
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/c4fe2045/attachment-0001.html>


More information about the petsc-users mailing list