[petsc-users] Issue updating MUMPS ictnl after failed solve

David Knezevic david.knezevic at akselos.com
Mon Sep 19 14:05:13 CDT 2016


When I use MUMPS via PETSc, one issue is that it can sometimes fail with
MUMPS error -9, which means that MUMPS didn't allocate a big enough
workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
via the command line option -mat_mumps_icntl_14.

However, instead of having to run several times with different command line
options, I'd like to be able to automatically increment icntl 14 value in a
loop until the solve succeeds.

I have a saved matrix which fails when I use it for a solve with MUMPS with
4 MPI processes and the default ictnl values, so I'm using this to check
that I can achieve the automatic icntl 14 update, as described above. (The
matrix is 14MB so I haven't attached it here, but I'd be happy to send it
to anyone else who wants to try this test case out.)

I've pasted some test code below which provides a simple test of this idea
using two solves. The first solve uses the default value of icntl 14, which
fails, and then we update icntl 14 to 30 and solve again. The second solve
should succeed since icntl 14 of 30 is sufficient for MUMPS to succeed in
this case, but for some reason the second solve still fails.

Below I've also pasted the output from -ksp_view, and you can see that
ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
output), so it's not clear to me why the second solve fails. It seems like
MUMPS is ignoring the update to the ictnl value?

Thanks,
David

------------------------------------------------------------
-----------------------------------------
Test code:

  Mat A;
  MatCreate(PETSC_COMM_WORLD,&A);
  MatSetType(A,MATMPIAIJ);

  PetscViewer petsc_viewer;
  PetscViewerBinaryOpen( PETSC_COMM_WORLD,
                         "matrix.dat",
                         FILE_MODE_READ,
                         &petsc_viewer);
  MatLoad(A, petsc_viewer);
  PetscViewerDestroy(&petsc_viewer);

  PetscInt m, n;
  MatGetSize(A, &m, &n);

  Vec x;
  VecCreate(PETSC_COMM_WORLD,&x);
  VecSetSizes(x,PETSC_DECIDE,m);
  VecSetFromOptions(x);
  VecSet(x,1.0);

  Vec b;
  VecDuplicate(x,&b);

  KSP ksp;
  PC pc;

  KSPCreate(PETSC_COMM_WORLD,&ksp);
  KSPSetOperators(ksp,A,A);

  KSPSetType(ksp,KSPPREONLY);
  KSPGetPC(ksp,&pc);

  PCSetType(pc,PCCHOLESKY);

  PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
  PCFactorSetUpMatSolverPackage(pc);

  KSPSetFromOptions(ksp);
  KSPSetUp(ksp);

  KSPSolve(ksp,b,x);

  {
    KSPConvergedReason reason;
    KSPGetConvergedReason(ksp, &reason);
    std::cout << "converged reason: " << reason << std::endl;
  }

  Mat F;
  PCFactorGetMatrix(pc,&F);
  MatMumpsSetIcntl(F,14,30);

  KSPSolve(ksp,b,x);

  {
    KSPConvergedReason reason;
    KSPGetConvergedReason(ksp, &reason);
    std::cout << "converged reason: " << reason << std::endl;
  }

------------------------------------------------------------
-----------------------------------------
-ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged
reason: -11" for both solves)

KSP Object: 4 MPI processes
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using NONE norm type for convergence test
PC Object: 4 MPI processes
  type: cholesky
    Cholesky: out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: natural
    factor fill ratio given 0., needed 0.
      Factored matrix follows:
        Mat Object:         4 MPI processes
          type: mpiaij
          rows=22878, cols=22878
          package used to perform factorization: mumps
          total: nonzeros=3361617, allocated nonzeros=3361617
          total number of mallocs used during MatSetValues calls =0
            MUMPS run parameters:
              SYM (matrix type):                   2
              PAR (host participation):            1
              ICNTL(1) (output for error):         6
              ICNTL(2) (output of diagnostic msg): 0
              ICNTL(3) (output for global info):   0
              ICNTL(4) (level of printing):        0
              ICNTL(5) (input mat struct):         0
              ICNTL(6) (matrix prescaling):        7
              ICNTL(7) (sequentia matrix ordering):7
              ICNTL(8) (scalling strategy):        77
              ICNTL(10) (max num of refinements):  0
              ICNTL(11) (error analysis):          0
              ICNTL(12) (efficiency control):                         0
              ICNTL(13) (efficiency control):                         0
              ICNTL(14) (percentage of estimated workspace increase): 20
              ICNTL(18) (input mat struct):                           3
              ICNTL(19) (Shur complement info):                       0
              ICNTL(20) (rhs sparse pattern):                         0
              ICNTL(21) (solution struct):                            1
              ICNTL(22) (in-core/out-of-core facility):               0
              ICNTL(23) (max size of memory can be allocated locally):0
              ICNTL(24) (detection of null pivot rows):               0
              ICNTL(25) (computation of a null space basis):          0
              ICNTL(26) (Schur options for rhs or solution):          0
              ICNTL(27) (experimental parameter):                     -24
              ICNTL(28) (use parallel or sequential ordering):        1
              ICNTL(29) (parallel ordering):                          0
              ICNTL(30) (user-specified set of entries in inv(A)):    0
              ICNTL(31) (factors is discarded in the solve phase):    0
              ICNTL(33) (compute determinant):                        0
              CNTL(1) (relative pivoting threshold):      0.01
              CNTL(2) (stopping criterion of refinement): 1.49012e-08
              CNTL(3) (absolute pivoting threshold):      0.
              CNTL(4) (value of static pivoting):         -1.
              CNTL(5) (fixation for null pivots):         0.
              RINFO(1) (local estimated flops for the elimination after
analysis):
                [0] 1.84947e+08
                [1] 2.42065e+08
                [2] 2.53044e+08
                [3] 2.18441e+08
              RINFO(2) (local estimated flops for the assembly after
factorization):
                [0]  945938.
                [1]  906795.
                [2]  897815.
                [3]  998840.
              RINFO(3) (local estimated flops for the elimination after
factorization):
                [0]  1.59835e+08
                [1]  1.50867e+08
                [2]  2.27932e+08
                [3]  1.52037e+08
              INFO(15) (estimated size of (in MB) MUMPS internal data for
running numerical factorization):
              [0] 36
              [1] 37
              [2] 38
              [3] 39
              INFO(16) (size of (in MB) MUMPS internal data used during
numerical factorization):
                [0] 36
                [1] 37
                [2] 38
                [3] 39
              INFO(23) (num of pivots eliminated on this processor after
factorization):
                [0] 6450
                [1] 5442
                [2] 4386
                [3] 5526
              RINFOG(1) (global estimated flops for the elimination after
analysis): 8.98497e+08
              RINFOG(2) (global estimated flops for the assembly after
factorization): 3.74939e+06
              RINFOG(3) (global estimated flops for the elimination after
factorization): 6.9067e+08
              (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
(0.,0.)*(2^0)
              INFOG(3) (estimated real workspace for factors on all
processors after analysis): 4082184
              INFOG(4) (estimated integer workspace for factors on all
processors after analysis): 231846
              INFOG(5) (estimated maximum front size in the complete tree):
678
              INFOG(6) (number of nodes in the complete tree): 1380
              INFOG(7) (ordering option effectively use after analysis): 5
              INFOG(8) (structural symmetry in percent of the permuted
matrix after analysis): 100
              INFOG(9) (total real/complex workspace to store the matrix
factors after factorization): 3521904
              INFOG(10) (total integer space store the matrix factors after
factorization): 229416
              INFOG(11) (order of largest frontal matrix after
factorization): 678
              INFOG(12) (number of off-diagonal pivots): 0
              INFOG(13) (number of delayed pivots after factorization): 0
              INFOG(14) (number of memory compress after factorization): 0
              INFOG(15) (number of steps of iterative refinement after
solution): 0
              INFOG(16) (estimated size (in MB) of all MUMPS internal data
for factorization after analysis: value on the most memory consuming
processor): 39
              INFOG(17) (estimated size of all MUMPS internal data for
factorization after analysis: sum over all processors): 150
              INFOG(18) (size of all MUMPS internal data allocated during
factorization: value on the most memory consuming processor): 39
              INFOG(19) (size of all MUMPS internal data allocated during
factorization: sum over all processors): 150
              INFOG(20) (estimated number of entries in the factors):
3361617
              INFOG(21) (size in MB of memory effectively used during
factorization - value on the most memory consuming processor): 35
              INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 136
              INFOG(23) (after analysis: value of ICNTL(6) effectively
used): 0
              INFOG(24) (after analysis: value of ICNTL(12) effectively
used): 1
              INFOG(25) (after factorization: number of pivots modified by
static pivoting): 0
              INFOG(28) (after factorization: number of null pivots
encountered): 0
              INFOG(29) (after factorization: effective number of entries
in the factors (sum over all processors)): 2931438
              INFOG(30, 31) (after solution: size in Mbytes of memory used
during solution phase): 0, 0
              INFOG(32) (after analysis: type of analysis done): 1
              INFOG(33) (value used for ICNTL(8)): 7
              INFOG(34) (exponent of the determinant if determinant is
requested): 0
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=22878, cols=22878
    total: nonzeros=1219140, allocated nonzeros=1219140
    total number of mallocs used during MatSetValues calls =0
      using I-node (on process 0) routines: found 1889 nodes, limit used is
5
converged reason: -11
KSP Object: 4 MPI processes
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using NONE norm type for convergence test
PC Object: 4 MPI processes
  type: cholesky
    Cholesky: out-of-place factorization
    tolerance for zero pivot 2.22045e-14
    matrix ordering: natural
    factor fill ratio given 0., needed 0.
      Factored matrix follows:
        Mat Object:         4 MPI processes
          type: mpiaij
          rows=22878, cols=22878
          package used to perform factorization: mumps
          total: nonzeros=3361617, allocated nonzeros=3361617
          total number of mallocs used during MatSetValues calls =0
            MUMPS run parameters:
              SYM (matrix type):                   2
              PAR (host participation):            1
              ICNTL(1) (output for error):         6
              ICNTL(2) (output of diagnostic msg): 0
              ICNTL(3) (output for global info):   0
              ICNTL(4) (level of printing):        0
              ICNTL(5) (input mat struct):         0
              ICNTL(6) (matrix prescaling):        7
              ICNTL(7) (sequentia matrix ordering):7
              ICNTL(8) (scalling strategy):        77
              ICNTL(10) (max num of refinements):  0
              ICNTL(11) (error analysis):          0
              ICNTL(12) (efficiency control):                         0
              ICNTL(13) (efficiency control):                         0
              ICNTL(14) (percentage of estimated workspace increase): 30
              ICNTL(18) (input mat struct):                           3
              ICNTL(19) (Shur complement info):                       0
              ICNTL(20) (rhs sparse pattern):                         0
              ICNTL(21) (solution struct):                            1
              ICNTL(22) (in-core/out-of-core facility):               0
              ICNTL(23) (max size of memory can be allocated locally):0
              ICNTL(24) (detection of null pivot rows):               0
              ICNTL(25) (computation of a null space basis):          0
              ICNTL(26) (Schur options for rhs or solution):          0
              ICNTL(27) (experimental parameter):                     -24
              ICNTL(28) (use parallel or sequential ordering):        1
              ICNTL(29) (parallel ordering):                          0
              ICNTL(30) (user-specified set of entries in inv(A)):    0
              ICNTL(31) (factors is discarded in the solve phase):    0
              ICNTL(33) (compute determinant):                        0
              CNTL(1) (relative pivoting threshold):      0.01
              CNTL(2) (stopping criterion of refinement): 1.49012e-08
              CNTL(3) (absolute pivoting threshold):      0.
              CNTL(4) (value of static pivoting):         -1.
              CNTL(5) (fixation for null pivots):         0.
              RINFO(1) (local estimated flops for the elimination after
analysis):
                [0] 1.84947e+08
                [1] 2.42065e+08
                [2] 2.53044e+08
                [3] 2.18441e+08
              RINFO(2) (local estimated flops for the assembly after
factorization):
                [0]  945938.
                [1]  906795.
                [2]  897815.
                [3]  998840.
              RINFO(3) (local estimated flops for the elimination after
factorization):
                [0]  1.59835e+08
                [1]  1.50867e+08
                [2]  2.27932e+08
                [3]  1.52037e+08
              INFO(15) (estimated size of (in MB) MUMPS internal data for
running numerical factorization):
              [0] 36
              [1] 37
              [2] 38
              [3] 39
              INFO(16) (size of (in MB) MUMPS internal data used during
numerical factorization):
                [0] 36
                [1] 37
                [2] 38
                [3] 39
              INFO(23) (num of pivots eliminated on this processor after
factorization):
                [0] 6450
                [1] 5442
                [2] 4386
                [3] 5526
              RINFOG(1) (global estimated flops for the elimination after
analysis): 8.98497e+08
              RINFOG(2) (global estimated flops for the assembly after
factorization): 3.74939e+06
              RINFOG(3) (global estimated flops for the elimination after
factorization): 6.9067e+08
              (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
(0.,0.)*(2^0)
              INFOG(3) (estimated real workspace for factors on all
processors after analysis): 4082184
              INFOG(4) (estimated integer workspace for factors on all
processors after analysis): 231846
              INFOG(5) (estimated maximum front size in the complete tree):
678
              INFOG(6) (number of nodes in the complete tree): 1380
              INFOG(7) (ordering option effectively use after analysis): 5
              INFOG(8) (structural symmetry in percent of the permuted
matrix after analysis): 100
              INFOG(9) (total real/complex workspace to store the matrix
factors after factorization): 3521904
              INFOG(10) (total integer space store the matrix factors after
factorization): 229416
              INFOG(11) (order of largest frontal matrix after
factorization): 678
              INFOG(12) (number of off-diagonal pivots): 0
              INFOG(13) (number of delayed pivots after factorization): 0
              INFOG(14) (number of memory compress after factorization): 0
              INFOG(15) (number of steps of iterative refinement after
solution): 0
              INFOG(16) (estimated size (in MB) of all MUMPS internal data
for factorization after analysis: value on the most memory consuming
processor): 39
              INFOG(17) (estimated size of all MUMPS internal data for
factorization after analysis: sum over all processors): 150
              INFOG(18) (size of all MUMPS internal data allocated during
factorization: value on the most memory consuming processor): 39
              INFOG(19) (size of all MUMPS internal data allocated during
factorization: sum over all processors): 150
              INFOG(20) (estimated number of entries in the factors):
3361617
              INFOG(21) (size in MB of memory effectively used during
factorization - value on the most memory consuming processor): 35
              INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 136
              INFOG(23) (after analysis: value of ICNTL(6) effectively
used): 0
              INFOG(24) (after analysis: value of ICNTL(12) effectively
used): 1
              INFOG(25) (after factorization: number of pivots modified by
static pivoting): 0
              INFOG(28) (after factorization: number of null pivots
encountered): 0
              INFOG(29) (after factorization: effective number of entries
in the factors (sum over all processors)): 2931438
              INFOG(30, 31) (after solution: size in Mbytes of memory used
during solution phase): 0, 0
              INFOG(32) (after analysis: type of analysis done): 1
              INFOG(33) (value used for ICNTL(8)): 7
              INFOG(34) (exponent of the determinant if determinant is
requested): 0
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=22878, cols=22878
    total: nonzeros=1219140, allocated nonzeros=1219140
    total number of mallocs used during MatSetValues calls =0
      using I-node (on process 0) routines: found 1889 nodes, limit used is
5
converged reason: -11

------------------------------------------------------------
-----------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160919/7775bf30/attachment-0001.html>


More information about the petsc-users mailing list