[petsc-users] KSPSetUp does not scale

Jed Brown jedbrown at mcs.anl.gov
Mon Nov 19 06:41:32 CST 2012


Just have it do one or a few iterations.


On Mon, Nov 19, 2012 at 1:36 PM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:

>  I can do this! Should I stop the run after KSPSetUp? Or do you want to
> see the log_summary file from the whole run?
>
> Thomas
>
> Am 19.11.2012 13:33, schrieb Jed Brown:
>
> Always, always, always send -log_summary when asking about performance.
>
>
> On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski <
> thomas.witkowski at tu-dresden.de> wrote:
>
>> I have some scaling problem in KSPSetUp, maybe some of you can help me to
>> fix it. It takes 4.5 seconds on 64 cores, and 4.0 cores on 128 cores. The
>> matrix has around 11 million rows and is not perfectly balanced, but the
>> number of maximum rows per core in the 128 cases is exactly halfe of the
>> number in the case when using 64 cores. Besides the scaling, why does the
>> setup takes so long? I though that just some objects are created but no
>> calculation is going on!
>>
>> The KSPView on the corresponding solver objects is as follows:
>>
>> KSP Object:(ns_) 64 MPI processes
>>   type: fgmres
>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>     GMRES: happy breakdown tolerance 1e-30
>>   maximum iterations=100, initial guess is zero
>>   tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>>   right preconditioning
>>   has attached null space
>>   using UNPRECONDITIONED norm type for convergence test
>> PC Object:(ns_) 64 MPI processes
>>   type: fieldsplit
>>     FieldSplit with Schur preconditioner, factorization FULL
>>     Preconditioner for the Schur complement formed from the block
>> diagonal part of A11
>>     Split info:
>>     Split number 0 Defined by IS
>>     Split number 1 Defined by IS
>>     KSP solver for A00 block
>>       KSP Object:      (ns_fieldsplit_velocity_)       64 MPI processes
>>         type: preonly
>>         maximum iterations=10000, initial guess is zero
>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>         left preconditioning
>>         using DEFAULT norm type for convergence test
>>       PC Object:      (ns_fieldsplit_velocity_)       64 MPI processes
>>         type: none
>>         linear system matrix = precond matrix:
>>         Matrix Object:         64 MPI processes
>>           type: mpiaij
>>           rows=11068107, cols=11068107
>>           total: nonzeros=315206535, allocated nonzeros=315206535
>>           total number of mallocs used during MatSetValues calls =0
>>             not using I-node (on process 0) routines
>>     KSP solver for S = A11 - A10 inv(A00) A01
>>       KSP Object:      (ns_fieldsplit_pressure_)       64 MPI processes
>>         type: gmres
>>           GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>           GMRES: happy breakdown tolerance 1e-30
>>         maximum iterations=10000, initial guess is zero
>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>         left preconditioning
>>         using DEFAULT norm type for convergence test
>>       PC Object:      (ns_fieldsplit_pressure_)       64 MPI processes
>>         type: none
>>         linear system matrix followed by preconditioner matrix:
>>         Matrix Object:         64 MPI processes
>>           type: schurcomplement
>>           rows=469678, cols=469678
>>             Schur complement A11 - A10 inv(A00) A01
>>             A11
>>               Matrix Object:               64 MPI processes
>>                 type: mpiaij
>>                 rows=469678, cols=469678
>>                 total: nonzeros=0, allocated nonzeros=0
>>                 total number of mallocs used during MatSetValues calls =0
>>                   using I-node (on process 0) routines: found 1304 nodes,
>> limit used is 5
>>             A10
>>               Matrix Object:               64 MPI processes
>>                 type: mpiaij
>>                 rows=469678, cols=11068107
>>                 total: nonzeros=89122957, allocated nonzeros=89122957
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node (on process 0) routines
>>             KSP of A00
>>               KSP Object: (ns_fieldsplit_velocity_)               64 MPI
>> processes
>>                 type: preonly
>>                 maximum iterations=10000, initial guess is zero
>>                 tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000
>>                 left preconditioning
>>                 using DEFAULT norm type for convergence test
>>               PC Object: (ns_fieldsplit_velocity_)               64 MPI
>> processes
>>                 type: none
>>                 linear system matrix = precond matrix:
>>                 Matrix Object:                 64 MPI processes
>>                   type: mpiaij
>>                   rows=11068107, cols=11068107
>>                   total: nonzeros=315206535, allocated nonzeros=315206535
>>                   total number of mallocs used during MatSetValues calls
>> =0
>>                     not using I-node (on process 0) routines
>>             A01
>>               Matrix Object:               64 MPI processes
>>                 type: mpiaij
>>                 rows=11068107, cols=469678
>>                 total: nonzeros=88821041, allocated nonzeros=88821041
>>                 total number of mallocs used during MatSetValues calls =0
>>                   not using I-node (on process 0) routines
>>         Matrix Object:         64 MPI processes
>>           type: mpiaij
>>           rows=469678, cols=469678
>>           total: nonzeros=0, allocated nonzeros=0
>>           total number of mallocs used during MatSetValues calls =0
>>             using I-node (on process 0) routines: found 1304 nodes, limit
>> used is 5
>>   linear system matrix = precond matrix:
>>   Matrix Object:   64 MPI processes
>>     type: mpiaij
>>     rows=11537785, cols=11537785
>>     total: nonzeros=493150533, allocated nonzeros=510309207
>>     total number of mallocs used during MatSetValues calls =0
>>       not using I-node (on process 0) routines
>>
>>
>>
>>
>> Thomas
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/f0095ca1/attachment.html>


More information about the petsc-users mailing list