[petsc-users] lying about nullspaces
Geoffrey Irving
irving at naml.us
Mon Jan 9 23:08:56 CST 2012
On Mon, Jan 9, 2012 at 8:30 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Mon, Jan 9, 2012 at 18:20, Geoffrey Irving <irving at naml.us> wrote:
>> The subspace is derived from
>> freezing the normal velocity of points involved in collisions, so it
>> has no useful algebraic properties.
> About how many in practice, both as absolute numbers and as fraction of the
> total number of nodes? Are the elastic bodies closely packed enough to
> undergo locking (as in granular media). I ask because it affects the
> locality of the response to the constraints.
I don't have this simulation up and running yet, but roughly I'd
expect 0 to 10% of the nodes to be involved in collisions. I'm
dealing only with kinematic object collisions at the moment, so pairs
of close colliding nodes will have very similar collision normals, and
therefore very similar constraint subspaces, and therefore shouldn't
>> It's not too difficult to symbolically apply P to A (it won't change
>> the sparsity), but unfortunately that would make the sparsity pattern
>> change each iteration, which would significantly increase the cost of
>> ICC.
> It changes each time step or each nonlinear iteration, but as long as you
> need a few linear iterations, the cost of the fresh symbolic factorization
> is not likely to be high. I'm all for reusing data structures, but if you
> are just using ICC, it might not be worth it. Preallocating for the reduced
> matrix might be tricky.
For now, I believe I can get away with a single linear iteration.
Even if I need a few, the extra cost of the first linear solve appears
to be drastic. However, it appears you're right that this isn't due
to preconditioner setup. The first solve takes over 50 times as long
as the other solves:
step 1
dt = 0.00694444, time = 0
cg icc converged: iterations = 4, rtol = 0.001, error = 9.56519e-05
actual L2 residual = 1.10131e-05
max speed = 0.00728987
END step 1 0.6109 s
step 2
dt = 0.00694444, time = 0.00694444
cg icc converged: iterations = 3, rtol = 0.001, error = 0.000258359
actual L2 residual = 3.13442e-05
max speed = 0.0148876
END step 2 0.0089 s
Note that this is a very small problem, but even if it took 100x the
iterations the first solve would still be significant more expensive
than the second. However, if I pretend the nonzero pattern changes
every iteration, I only see a 20% performance hit overall, so
something else is happening on the first iteration. Do you know what
it is? The results of -log_summary are attached if it helps.
> Note that you can also enforce the constraints using Lagrange multipliers.
> If the effect of the Lagrange multipliers are local, then you can likely get
> away with an Uzawa-type algorithm (perhaps combined with some form of
> multigrid for the unconstrained system). If the contact constraints cause
> long-range response, Uzawa-type methods may not converge as quickly, but
> there are still lots of alternatives.
Lagrange multipliers are unfortunate since the system is otherwise
definite. The effect of the constraints will in general be global,
since they will often be the only force combating the net effect of
gravity. In any case, if recomputing the preconditioner appears to be
cheap, symbolic elimination is probably the way to go.
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./sim on a darwin named tile.local with 1 processor, by irving Mon Jan 9 20:50:19 2012
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011
Max Max/Min Avg Total
Time (sec): 6.567e-01 1.00000 6.567e-01
Objects: 4.100e+01 1.00000 4.100e+01
Flops: 1.248e+07 1.00000 1.248e+07 1.248e+07
Flops/sec: 1.901e+07 1.00000 1.901e+07 1.901e+07
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 4.100e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.5669e-01 100.0% 1.2482e+07 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 4.000e+01 97.6%
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
--- Event Stage 0: Main Stage
MatMult 25 1.0 2.0399e-03 1.0 5.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 42 0 0 0 0 42 0 0 0 2575
MatSolve 31 1.0 4.3278e-03 1.0 6.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 52 0 0 0 1 52 0 0 0 1505
MatCholFctrNum 6 1.0 1.2992e-02 1.0 1.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 1
MatICCFactorSym 1 1.0 3.5391e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 1 0 0 0 2 1 0 0 0 2 0
MatAssemblyBegin 6 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 6 1.0 7.3647e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 6 1.0 6.2156e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 6 1.0 3.7446e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 1 0 0 0 59 1 0 0 0 60 0
VecTDot 38 1.0 5.4598e-05 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 3954
VecNorm 31 1.0 1.0037e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1755
VecCopy 6 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 38 1.0 6.2704e-05 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 3443
VecAYPX 19 1.0 6.8188e-05 1.0 9.09e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1333
KSPSetup 6 1.0 2.6941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 7 0 0 0 0 8 0
KSPSolve 6 1.0 2.7211e-02 1.0 1.25e+07 1.0 0.0e+00 0.0e+00 3.6e+01 4100 0 0 88 4100 0 0 90 459
PCSetUp 6 1.0 2.0375e-02 1.0 1.70e+04 1.0 0.0e+00 0.0e+00 2.7e+01 3 0 0 0 66 3 0 0 0 68 1
PCApply 31 1.0 4.3375e-03 1.0 6.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 52 0 0 0 1 52 0 0 0 1502
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 2 2 1984184 0
Vector 11 11 265848 0
Krylov Solver 1 1 1144 0
Preconditioner 1 1 904 0
Index Set 25 25 200624 0
Viewer 1 0 0 0
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-ksp_max_it 100
-ksp_rtol 1e-3
-ksp_type cg
-pc_factor_levels 0
-pc_factor_mat_ordering_type nd
-pc_type icc
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Mon Nov 21 14:00:28 2011
Configure options: --prefix=/opt/local --with-python --with-debugging=0 --with-c-support=1 --with-c++-support=1 --with-pic=fPIC --with-shared-libraries=0 --with-mpi=1 --PETSC_ARCH=darwin --prefix=/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/destroot/opt/local/lib/petsc --with-cc=/opt/local/bin/openmpicc --with-cxx=/opt/local/bin/openmpicxx --with-mpiexec=/opt/local/bin/openmpiexec --with-fc=/opt/local/bin/openmpif90 --LIBS=-lstdc++
Libraries compiled on Mon Nov 21 14:00:28 2011 on tile.local
Machine characteristics: Darwin-11.2.0-x86_64-i386-64bit
Using PETSc directory: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5
Using PETSc arch: darwin
Using C compiler: /opt/local/bin/openmpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/local/bin/openmpif90 -Wall -Wno-unused-variable -O ${FOPTFLAGS} ${FFLAGS}
Using include paths: -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/include -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/include -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/include -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/include -I/opt/local/include -I/opt/local/include/openmpi
Using C linker: /opt/local/bin/openmpicc
Using Fortran linker: /opt/local/bin/openmpif90
Using libraries: -L/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/lib -L/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/lib -lpetsc -L/opt/local/lib -lX11 -lpthread -llapack -lblas -ldl -lstdc++ -lmpi_f90 -lmpi_f77 -lmpi -lgfortran -L/opt/local/lib/gcc44/gcc/x86_64-apple-darwin11/4.4.6 -L/opt/local/lib/gcc44 -lgcc_s.10.5 -lSystem -ldl -lstdc++
More information about the petsc-users
mailing list