[petsc-users] Speed up KSPSolve of dense matrix

Tue Nov 4 03:28:41 CST 2014

Hello,

I have a fulll matrix of size e.g. 603x603 of which I'm very disappointed with the runtime, compared to a naive LU / forward / backward solution. My petsc solution takes about 14s, while the old one takes just 0.5s. (when you're looking at sparse matrices the figures are almost inversed). I try to investigate what's the problem.

Most of the time is spent in the KSPSolve.

Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult            10334 1.0 6.3801e+00 1.0 7.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 45 49  0  0  0  45 49  0  0  0  1177
MatSolve           10334 1.0 6.9187e+00 1.0 7.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 49 49  0  0  0  49 49  0  0  0  1085
MatCholFctrNum         1 1.0 1.8968e-01 1.0 6.03e+02 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPGMRESOrthog     10000 1.0 2.1417e-01 1.0 3.73e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1744
KSPSolve               1 1.0 1.3763e+01 1.0 1.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 97100  0  0  0  97100  0  0  0  1120
Prealloc Matrix C       1 1.0 5.7458e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
Filling Matrix C       1 1.0 6.7984e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot            10000 1.0 8.8615e-02 1.0 1.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2106
VecMAXPY           10334 1.0 1.2585e-01 1.0 1.99e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  1580
Prealloc Matrix A       1 1.0 1.1071e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
Filling Matrix A       1 1.0 1.2574e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCSetUp                1 1.0 1.9073e-01 1.0 6.03e+02 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCApply            10334 1.0 6.9233e+00 1.0 7.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 49 49  0  0  0  49 49  0  0  0  1085
------------------------------------------------------------------------------------------------------------------------

Removed all events with global %T == 0, but my own ones.

Matrix type is sequential MATSBAIJ on PETSC_COMM_SELF.

KSP is initalized once:

  KSPSetOperators(_solver, _matrixC.matrix, _matrixC.matrix);
  KSPSetTolerances(_solver, _solverRtol, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
  KSPSetFromOptions(_solver);

KSPSolve: 

  ierr = KSPSolve(_solver, in.vector, p.vector); CHKERRV(ierr)

KSPSolve may be executed multiple times with the same SetOperators acording to dimensionality of the data. Here it's just one.

-ksp_view says:

KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: icc
    0 levels of fill
    tolerance for zero pivot 2.22045e-14
    using Manteuffel shift [POSITIVE_DEFINITE]
    matrix ordering: natural
    factor fill ratio given 1, needed 1
      Factored matrix follows:
        Mat Object:         1 MPI processes
          type: seqsbaij
          rows=603, cols=603
          package used to perform factorization: petsc
          total: nonzeros=182103, allocated nonzeros=182103
          total number of mallocs used during MatSetValues calls =0
              block size is 1
  linear system matrix = precond matrix:
  Mat Object:  C   1 MPI processes
    type: seqsbaij
    rows=603, cols=603
    total: nonzeros=182103, allocated nonzeros=182103
    total number of mallocs used during MatSetValues calls =0
        block size is 1

I've tried to use a direct solver like suggested on pp 72, but:

./petBench 600 1 -ksp_type preonly -pc_type lu
Mesh of size: 600
Do mappings: 1
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: No support for this operation for this object type
[0]PETSC ERROR: Factor type not supported
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.5.2, unknown 
[0]PETSC ERROR: ./petBench on a arch-linux2-c-opt named helium by lindnefn Tue Nov  4 10:19:28 2014
[0]PETSC ERROR: Configure options --with-debugging=0
[0]PETSC ERROR: #1 MatGetFactor_seqsbaij_petsc() line 1833 in /data2/scratch/lindner/petsc/src/mat/impls/sbaij/seq/sbaij.c
[0]PETSC ERROR: #2 MatGetFactor() line 3963 in /data2/scratch/lindner/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: #3 PCSetUp_LU() line 125 in /data2/scratch/lindner/petsc/src/ksp/pc/impls/factor/lu/lu.c
[0]PETSC ERROR: #4 PCSetUp() line 902 in /data2/scratch/lindner/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #5 KSPSetUp() line 305 in /data2/scratch/lindner/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #6 KSPSolve() line 417 in /data2/scratch/lindner/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #7 map() line 391 in /data2/scratch/lindner/precice/src/mapping/PetRadialBasisFctMapping.hpp

Any other idea about how to speed up that?

Thanks for any suggestions!
Florian