<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Here are the two files. In this case,
maybe you can also give me some hints, why the solver at all does
not scale here. The solver runtime for 64 cores is 206 seconds,
with the same problem size on 128 cores it takes 172 seconds. The
number of inner and outer solver iterations are the same for both
runs. I use CG with jacobi-preconditioner and hypre boomeramg for
inner solver. <br>
<br>
Am 19.11.2012 13:41, schrieb Jed Brown:<br>
</div>
<blockquote
cite="mid:CAM9tzSmD2oyr_=ZYqroOAYFmGbcBMTkJ6ofOdxektPc1SD4QVA@mail.gmail.com"
type="cite">Just have it do one or a few iterations.
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Nov 19, 2012 at 1:36 PM, Thomas
Witkowski <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:thomas.witkowski@tu-dresden.de"
target="_blank">thomas.witkowski@tu-dresden.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>I can do this! Should I stop the run after KSPSetUp?
Or do you want to see the log_summary file from the
whole run?<br>
<br>
Thomas<br>
<br>
Am 19.11.2012 13:33, schrieb Jed Brown:<br>
</div>
<div>
<div class="h5">
<blockquote type="cite">Always, always, always send
-log_summary when asking about performance.
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Nov 19, 2012 at
11:26 AM, Thomas Witkowski <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:thomas.witkowski@tu-dresden.de"
target="_blank">thomas.witkowski@tu-dresden.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">I have some scaling
problem in KSPSetUp, maybe some of you can
help me to fix it. It takes 4.5 seconds on 64
cores, and 4.0 cores on 128 cores. The matrix
has around 11 million rows and is not
perfectly balanced, but the number of maximum
rows per core in the 128 cases is exactly
halfe of the number in the case when using 64
cores. Besides the scaling, why does the setup
takes so long? I though that just some objects
are created but no calculation is going on!<br>
<br>
The KSPView on the corresponding solver
objects is as follows:<br>
<br>
KSP Object:(ns_) 64 MPI processes<br>
type: fgmres<br>
GMRES: restart=30, using Classical
(unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement<br>
GMRES: happy breakdown tolerance 1e-30<br>
maximum iterations=100, initial guess is
zero<br>
tolerances: relative=1e-06, absolute=1e-08,
divergence=10000<br>
right preconditioning<br>
has attached null space<br>
using UNPRECONDITIONED norm type for
convergence test<br>
PC Object:(ns_) 64 MPI processes<br>
type: fieldsplit<br>
FieldSplit with Schur preconditioner,
factorization FULL<br>
Preconditioner for the Schur complement
formed from the block diagonal part of A11<br>
Split info:<br>
Split number 0 Defined by IS<br>
Split number 1 Defined by IS<br>
KSP solver for A00 block<br>
KSP Object:
(ns_fieldsplit_velocity_) 64 MPI
processes<br>
type: preonly<br>
maximum iterations=10000, initial
guess is zero<br>
tolerances: relative=1e-05,
absolute=1e-50, divergence=10000<br>
left preconditioning<br>
using DEFAULT norm type for
convergence test<br>
PC Object:
(ns_fieldsplit_velocity_) 64 MPI
processes<br>
type: none<br>
linear system matrix = precond matrix:<br>
Matrix Object: 64 MPI
processes<br>
type: mpiaij<br>
rows=11068107, cols=11068107<br>
total: nonzeros=315206535, allocated
nonzeros=315206535<br>
total number of mallocs used during
MatSetValues calls =0<br>
not using I-node (on process 0)
routines<br>
KSP solver for S = A11 - A10 inv(A00) A01<br>
KSP Object:
(ns_fieldsplit_pressure_) 64 MPI
processes<br>
type: gmres<br>
GMRES: restart=30, using Classical
(unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement<br>
GMRES: happy breakdown tolerance
1e-30<br>
maximum iterations=10000, initial
guess is zero<br>
tolerances: relative=1e-05,
absolute=1e-50, divergence=10000<br>
left preconditioning<br>
using DEFAULT norm type for
convergence test<br>
PC Object:
(ns_fieldsplit_pressure_) 64 MPI
processes<br>
type: none<br>
linear system matrix followed by
preconditioner matrix:<br>
Matrix Object: 64 MPI
processes<br>
type: schurcomplement<br>
rows=469678, cols=469678<br>
Schur complement A11 - A10
inv(A00) A01<br>
A11<br>
Matrix Object: 64
MPI processes<br>
type: mpiaij<br>
rows=469678, cols=469678<br>
total: nonzeros=0, allocated
nonzeros=0<br>
total number of mallocs used
during MatSetValues calls =0<br>
using I-node (on process 0)
routines: found 1304 nodes, limit used is 5<br>
A10<br>
Matrix Object: 64
MPI processes<br>
type: mpiaij<br>
rows=469678, cols=11068107<br>
total: nonzeros=89122957,
allocated nonzeros=89122957<br>
total number of mallocs used
during MatSetValues calls =0<br>
not using I-node (on process
0) routines<br>
KSP of A00<br>
KSP Object:
(ns_fieldsplit_velocity_) 64 MPI
processes<br>
type: preonly<br>
maximum iterations=10000,
initial guess is zero<br>
tolerances: relative=1e-05,
absolute=1e-50, divergence=10000<br>
left preconditioning<br>
using DEFAULT norm type for
convergence test<br>
PC Object:
(ns_fieldsplit_velocity_) 64 MPI
processes<br>
type: none<br>
linear system matrix = precond
matrix:<br>
Matrix Object:
64 MPI processes<br>
type: mpiaij<br>
rows=11068107, cols=11068107<br>
total: nonzeros=315206535,
allocated nonzeros=315206535<br>
total number of mallocs used
during MatSetValues calls =0<br>
not using I-node (on
process 0) routines<br>
A01<br>
Matrix Object: 64
MPI processes<br>
type: mpiaij<br>
rows=11068107, cols=469678<br>
total: nonzeros=88821041,
allocated nonzeros=88821041<br>
total number of mallocs used
during MatSetValues calls =0<br>
not using I-node (on process
0) routines<br>
Matrix Object: 64 MPI
processes<br>
type: mpiaij<br>
rows=469678, cols=469678<br>
total: nonzeros=0, allocated
nonzeros=0<br>
total number of mallocs used during
MatSetValues calls =0<br>
using I-node (on process 0)
routines: found 1304 nodes, limit used is 5<br>
linear system matrix = precond matrix:<br>
Matrix Object: 64 MPI processes<br>
type: mpiaij<br>
rows=11537785, cols=11537785<br>
total: nonzeros=493150533, allocated
nonzeros=510309207<br>
total number of mallocs used during
MatSetValues calls =0<br>
not using I-node (on process 0) routines<span><font
color="#888888"><br>
<br>
<br>
<br>
<br>
Thomas<br>
</font></span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>