<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">I can do this! Should I stop the run
after KSPSetUp? Or do you want to see the log_summary file from
the whole run?<br>
<br>
Thomas<br>
<br>
Am 19.11.2012 13:33, schrieb Jed Brown:<br>
</div>
<blockquote
cite="mid:CAM9tzSkL0N7SN-HNjpX-e_5PyqGWmcz_E2923nD0QYBTAO38cQ@mail.gmail.com"
type="cite">Always, always, always send -log_summary when asking
about performance.
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Nov 19, 2012 at 11:26 AM,
Thomas Witkowski <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:thomas.witkowski@tu-dresden.de"
target="_blank">thomas.witkowski@tu-dresden.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">I have
some scaling problem in KSPSetUp, maybe some of you can help
me to fix it. It takes 4.5 seconds on 64 cores, and 4.0
cores on 128 cores. The matrix has around 11 million rows
and is not perfectly balanced, but the number of maximum
rows per core in the 128 cases is exactly halfe of the
number in the case when using 64 cores. Besides the scaling,
why does the setup takes so long? I though that just some
objects are created but no calculation is going on!<br>
<br>
The KSPView on the corresponding solver objects is as
follows:<br>
<br>
KSP Object:(ns_) 64 MPI processes<br>
type: fgmres<br>
GMRES: restart=30, using Classical (unmodified)
Gram-Schmidt Orthogonalization with no iterative refinement<br>
GMRES: happy breakdown tolerance 1e-30<br>
maximum iterations=100, initial guess is zero<br>
tolerances: relative=1e-06, absolute=1e-08,
divergence=10000<br>
right preconditioning<br>
has attached null space<br>
using UNPRECONDITIONED norm type for convergence test<br>
PC Object:(ns_) 64 MPI processes<br>
type: fieldsplit<br>
FieldSplit with Schur preconditioner, factorization FULL<br>
Preconditioner for the Schur complement formed from the
block diagonal part of A11<br>
Split info:<br>
Split number 0 Defined by IS<br>
Split number 1 Defined by IS<br>
KSP solver for A00 block<br>
KSP Object: (ns_fieldsplit_velocity_) 64
MPI processes<br>
type: preonly<br>
maximum iterations=10000, initial guess is zero<br>
tolerances: relative=1e-05, absolute=1e-50,
divergence=10000<br>
left preconditioning<br>
using DEFAULT norm type for convergence test<br>
PC Object: (ns_fieldsplit_velocity_) 64 MPI
processes<br>
type: none<br>
linear system matrix = precond matrix:<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=11068107, cols=11068107<br>
total: nonzeros=315206535, allocated
nonzeros=315206535<br>
total number of mallocs used during MatSetValues
calls =0<br>
not using I-node (on process 0) routines<br>
KSP solver for S = A11 - A10 inv(A00) A01<br>
KSP Object: (ns_fieldsplit_pressure_) 64
MPI processes<br>
type: gmres<br>
GMRES: restart=30, using Classical (unmodified)
Gram-Schmidt Orthogonalization with no iterative refinement<br>
GMRES: happy breakdown tolerance 1e-30<br>
maximum iterations=10000, initial guess is zero<br>
tolerances: relative=1e-05, absolute=1e-50,
divergence=10000<br>
left preconditioning<br>
using DEFAULT norm type for convergence test<br>
PC Object: (ns_fieldsplit_pressure_) 64 MPI
processes<br>
type: none<br>
linear system matrix followed by preconditioner
matrix:<br>
Matrix Object: 64 MPI processes<br>
type: schurcomplement<br>
rows=469678, cols=469678<br>
Schur complement A11 - A10 inv(A00) A01<br>
A11<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=469678, cols=469678<br>
total: nonzeros=0, allocated nonzeros=0<br>
total number of mallocs used during
MatSetValues calls =0<br>
using I-node (on process 0) routines:
found 1304 nodes, limit used is 5<br>
A10<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=469678, cols=11068107<br>
total: nonzeros=89122957, allocated
nonzeros=89122957<br>
total number of mallocs used during
MatSetValues calls =0<br>
not using I-node (on process 0) routines<br>
KSP of A00<br>
KSP Object: (ns_fieldsplit_velocity_)
64 MPI processes<br>
type: preonly<br>
maximum iterations=10000, initial guess is
zero<br>
tolerances: relative=1e-05, absolute=1e-50,
divergence=10000<br>
left preconditioning<br>
using DEFAULT norm type for convergence test<br>
PC Object: (ns_fieldsplit_velocity_)
64 MPI processes<br>
type: none<br>
linear system matrix = precond matrix:<br>
Matrix Object: 64 MPI
processes<br>
type: mpiaij<br>
rows=11068107, cols=11068107<br>
total: nonzeros=315206535, allocated
nonzeros=315206535<br>
total number of mallocs used during
MatSetValues calls =0<br>
not using I-node (on process 0) routines<br>
A01<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=11068107, cols=469678<br>
total: nonzeros=88821041, allocated
nonzeros=88821041<br>
total number of mallocs used during
MatSetValues calls =0<br>
not using I-node (on process 0) routines<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=469678, cols=469678<br>
total: nonzeros=0, allocated nonzeros=0<br>
total number of mallocs used during MatSetValues
calls =0<br>
using I-node (on process 0) routines: found 1304
nodes, limit used is 5<br>
linear system matrix = precond matrix:<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=11537785, cols=11537785<br>
total: nonzeros=493150533, allocated nonzeros=510309207<br>
total number of mallocs used during MatSetValues calls
=0<br>
not using I-node (on process 0) routines<span
class="HOEnZb"><font color="#888888"><br>
<br>
<br>
<br>
<br>
Thomas<br>
</font></span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>