[petsc-users] KSPSetUp does not scale
Jed Brown
jedbrown at mcs.anl.gov
Mon Nov 19 06:41:32 CST 2012
Just have it do one or a few iterations.
On Mon, Nov 19, 2012 at 1:36 PM, Thomas Witkowski <
thomas.witkowski at tu-dresden.de> wrote:
> I can do this! Should I stop the run after KSPSetUp? Or do you want to
> see the log_summary file from the whole run?
>
> Thomas
>
> Am 19.11.2012 13:33, schrieb Jed Brown:
>
> Always, always, always send -log_summary when asking about performance.
>
>
> On Mon, Nov 19, 2012 at 11:26 AM, Thomas Witkowski <
> thomas.witkowski at tu-dresden.de> wrote:
>
>> I have some scaling problem in KSPSetUp, maybe some of you can help me to
>> fix it. It takes 4.5 seconds on 64 cores, and 4.0 cores on 128 cores. The
>> matrix has around 11 million rows and is not perfectly balanced, but the
>> number of maximum rows per core in the 128 cases is exactly halfe of the
>> number in the case when using 64 cores. Besides the scaling, why does the
>> setup takes so long? I though that just some objects are created but no
>> calculation is going on!
>>
>> The KSPView on the corresponding solver objects is as follows:
>>
>> KSP Object:(ns_) 64 MPI processes
>> type: fgmres
>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>> maximum iterations=100, initial guess is zero
>> tolerances: relative=1e-06, absolute=1e-08, divergence=10000
>> right preconditioning
>> has attached null space
>> using UNPRECONDITIONED norm type for convergence test
>> PC Object:(ns_) 64 MPI processes
>> type: fieldsplit
>> FieldSplit with Schur preconditioner, factorization FULL
>> Preconditioner for the Schur complement formed from the block
>> diagonal part of A11
>> Split info:
>> Split number 0 Defined by IS
>> Split number 1 Defined by IS
>> KSP solver for A00 block
>> KSP Object: (ns_fieldsplit_velocity_) 64 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>> left preconditioning
>> using DEFAULT norm type for convergence test
>> PC Object: (ns_fieldsplit_velocity_) 64 MPI processes
>> type: none
>> linear system matrix = precond matrix:
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11068107, cols=11068107
>> total: nonzeros=315206535, allocated nonzeros=315206535
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node (on process 0) routines
>> KSP solver for S = A11 - A10 inv(A00) A01
>> KSP Object: (ns_fieldsplit_pressure_) 64 MPI processes
>> type: gmres
>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>> left preconditioning
>> using DEFAULT norm type for convergence test
>> PC Object: (ns_fieldsplit_pressure_) 64 MPI processes
>> type: none
>> linear system matrix followed by preconditioner matrix:
>> Matrix Object: 64 MPI processes
>> type: schurcomplement
>> rows=469678, cols=469678
>> Schur complement A11 - A10 inv(A00) A01
>> A11
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=469678, cols=469678
>> total: nonzeros=0, allocated nonzeros=0
>> total number of mallocs used during MatSetValues calls =0
>> using I-node (on process 0) routines: found 1304 nodes,
>> limit used is 5
>> A10
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=469678, cols=11068107
>> total: nonzeros=89122957, allocated nonzeros=89122957
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node (on process 0) routines
>> KSP of A00
>> KSP Object: (ns_fieldsplit_velocity_) 64 MPI
>> processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000
>> left preconditioning
>> using DEFAULT norm type for convergence test
>> PC Object: (ns_fieldsplit_velocity_) 64 MPI
>> processes
>> type: none
>> linear system matrix = precond matrix:
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11068107, cols=11068107
>> total: nonzeros=315206535, allocated nonzeros=315206535
>> total number of mallocs used during MatSetValues calls
>> =0
>> not using I-node (on process 0) routines
>> A01
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11068107, cols=469678
>> total: nonzeros=88821041, allocated nonzeros=88821041
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node (on process 0) routines
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=469678, cols=469678
>> total: nonzeros=0, allocated nonzeros=0
>> total number of mallocs used during MatSetValues calls =0
>> using I-node (on process 0) routines: found 1304 nodes, limit
>> used is 5
>> linear system matrix = precond matrix:
>> Matrix Object: 64 MPI processes
>> type: mpiaij
>> rows=11537785, cols=11537785
>> total: nonzeros=493150533, allocated nonzeros=510309207
>> total number of mallocs used during MatSetValues calls =0
>> not using I-node (on process 0) routines
>>
>>
>>
>>
>> Thomas
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121119/f0095ca1/attachment.html>
More information about the petsc-users
mailing list