[petsc-users] lying about nullspaces

Mon Jan 9 23:08:56 CST 2012

On Mon, Jan 9, 2012 at 8:30 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Mon, Jan 9, 2012 at 18:20, Geoffrey Irving <irving at naml.us> wrote:
>>
>> The subspace is derived from
>> freezing the normal velocity of points involved in collisions, so it
>> has no useful algebraic properties.
>
>
> About how many in practice, both as absolute numbers and as fraction of the
> total number of nodes? Are the elastic bodies closely packed enough to
> undergo locking (as in granular media). I ask because it affects the
> locality of the response to the constraints.

I don't have this simulation up and running yet, but roughly I'd
expect 0 to 10% of the nodes to be involved in collisions.  I'm
dealing only with kinematic object collisions at the moment, so pairs
of close colliding nodes will have very similar collision normals, and
therefore very similar constraint subspaces, and therefore shouldn't
lock.

>> It's not too difficult to symbolically apply P to A (it won't change
>> the sparsity), but unfortunately that would make the sparsity pattern
>> change each iteration, which would significantly increase the cost of
>> ICC.
>
> It changes each time step or each nonlinear iteration, but as long as you
> need a few linear iterations, the cost of the fresh symbolic factorization
> is not likely to be high. I'm all for reusing data structures, but if you
> are just using ICC, it might not be worth it. Preallocating for the reduced
> matrix might be tricky.

For now, I believe I can get away with a single linear iteration.
Even if I need a few, the extra cost of the first linear solve appears
to be drastic.  However, it appears you're right that this isn't due
to preconditioner setup.  The first solve takes over 50 times as long
as the other solves:

    step 1
      dt = 0.00694444, time = 0
      cg icc converged: iterations = 4, rtol = 0.001, error = 9.56519e-05
      actual L2 residual = 1.10131e-05
      max speed = 0.00728987
    END step 1                                      0.6109 s
    step 2
      dt = 0.00694444, time = 0.00694444
      cg icc converged: iterations = 3, rtol = 0.001, error = 0.000258359
      actual L2 residual = 3.13442e-05
      max speed = 0.0148876
    END step 2                                      0.0089 s

Note that this is a very small problem, but even if it took 100x the
iterations the first solve would still be significant more expensive
than the second.  However, if I pretend the nonzero pattern changes
every iteration, I only see a 20% performance hit overall, so
something else is happening on the first iteration.  Do you know what
it is?  The results of -log_summary are attached if it helps.

> Note that you can also enforce the constraints using Lagrange multipliers.
> If the effect of the Lagrange multipliers are local, then you can likely get
> away with an Uzawa-type algorithm (perhaps combined with some form of
> multigrid for the unconstrained system). If the contact constraints cause
> long-range response, Uzawa-type methods may not converge as quickly, but
> there are still lots of alternatives.

Lagrange multipliers are unfortunate since the system is otherwise
definite.  The effect of the constraints will in general be global,
since they will often be the only force combating the net effect of
gravity.  In any case, if recomputing the preconditioner appears to be
cheap, symbolic elimination is probably the way to go.

Thanks,
Geoffrey
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./sim on a darwin named tile.local with 1 processor, by irving Mon Jan  9 20:50:19 2012
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011 

                         Max       Max/Min        Avg      Total 
Time (sec):           6.567e-01      1.00000   6.567e-01
Objects:              4.100e+01      1.00000   4.100e+01
Flops:                1.248e+07      1.00000   1.248e+07  1.248e+07
Flops/sec:            1.901e+07      1.00000   1.901e+07  1.901e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       4.100e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.5669e-01 100.0%  1.2482e+07 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  4.000e+01  97.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               25 1.0 2.0399e-03 1.0 5.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0 42  0  0  0   0 42  0  0  0  2575
MatSolve              31 1.0 4.3278e-03 1.0 6.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1 52  0  0  0   1 52  0  0  0  1505
MatCholFctrNum         6 1.0 1.2992e-02 1.0 1.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     1
MatICCFactorSym        1 1.0 3.5391e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  1  0  0  0  2   1  0  0  0  2     0
MatAssemblyBegin       6 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         6 1.0 7.3647e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            6 1.0 6.2156e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         6 1.0 3.7446e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  1  0  0  0 59   1  0  0  0 60     0
VecTDot               38 1.0 5.4598e-05 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  3954
VecNorm               31 1.0 1.0037e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1755
VecCopy                6 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               38 1.0 6.2704e-05 1.0 2.16e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  3443
VecAYPX               19 1.0 6.8188e-05 1.0 9.09e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1333
KSPSetup               6 1.0 2.6941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  7   0  0  0  0  8     0
KSPSolve               6 1.0 2.7211e-02 1.0 1.25e+07 1.0 0.0e+00 0.0e+00 3.6e+01  4100  0  0 88   4100  0  0 90   459
PCSetUp                6 1.0 2.0375e-02 1.0 1.70e+04 1.0 0.0e+00 0.0e+00 2.7e+01  3  0  0  0 66   3  0  0  0 68     1
PCApply               31 1.0 4.3375e-03 1.0 6.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1 52  0  0  0   1 52  0  0  0  1502
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2      1984184     0
              Vector    11             11       265848     0
       Krylov Solver     1              1         1144     0
      Preconditioner     1              1          904     0
           Index Set    25             25       200624     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-ksp_max_it 100
-ksp_rtol 1e-3
-ksp_type cg
-log_summary
-pc_factor_levels 0
-pc_factor_mat_ordering_type nd
-pc_type icc
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Mon Nov 21 14:00:28 2011
Configure options: --prefix=/opt/local --with-python --with-debugging=0 --with-c-support=1 --with-c++-support=1 --with-pic=fPIC --with-shared-libraries=0 --with-mpi=1 --PETSC_ARCH=darwin --prefix=/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/destroot/opt/local/lib/petsc --with-cc=/opt/local/bin/openmpicc --with-cxx=/opt/local/bin/openmpicxx --with-mpiexec=/opt/local/bin/openmpiexec --with-fc=/opt/local/bin/openmpif90 --LIBS=-lstdc++
-----------------------------------------
Libraries compiled on Mon Nov 21 14:00:28 2011 on tile.local 
Machine characteristics: Darwin-11.2.0-x86_64-i386-64bit
Using PETSc directory: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5
Using PETSc arch: darwin
-----------------------------------------

Using C compiler: /opt/local/bin/openmpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/local/bin/openmpif90  -Wall -Wno-unused-variable -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/include -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/include -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/include -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/include -I/opt/local/include -I/opt/local/include/openmpi
-----------------------------------------

Using C linker: /opt/local/bin/openmpicc
Using Fortran linker: /opt/local/bin/openmpif90
Using libraries: -L/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/lib -L/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_math_petsc/petsc/work/petsc-3.2-p5/darwin/lib -lpetsc -L/opt/local/lib -lX11 -lpthread -llapack -lblas -ldl -lstdc++ -lmpi_f90 -lmpi_f77 -lmpi -lgfortran -L/opt/local/lib/gcc44/gcc/x86_64-apple-darwin11/4.4.6 -L/opt/local/lib/gcc44 -lgcc_s.10.5 -lSystem -ldl -lstdc++ 
-----------------------------------------