[petsc-users] Issue updating MUMPS ictnl after failed solve

Mon Sep 19 20:38:30 CDT 2016

Another guess: maybe you also need KSPSetUp(ksp); before the second
KSPSolve(ksp,b,x);.

Murat Keceli

On Mon, Sep 19, 2016 at 8:33 PM, David Knezevic <david.knezevic at akselos.com>
wrote:

> On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>>
>>
>> On 19 September 2016 at 21:05, David Knezevic <david.knezevic at akselos.com
>> > wrote:
>>
>>> When I use MUMPS via PETSc, one issue is that it can sometimes fail with
>>> MUMPS error -9, which means that MUMPS didn't allocate a big enough
>>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
>>> via the command line option -mat_mumps_icntl_14.
>>>
>>> However, instead of having to run several times with different command
>>> line options, I'd like to be able to automatically increment icntl 14 value
>>> in a loop until the solve succeeds.
>>>
>>> I have a saved matrix which fails when I use it for a solve with MUMPS
>>> with 4 MPI processes and the default ictnl values, so I'm using this to
>>> check that I can achieve the automatic icntl 14 update, as described above.
>>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send
>>> it to anyone else who wants to try this test case out.)
>>>
>>> I've pasted some test code below which provides a simple test of this
>>> idea using two solves. The first solve uses the default value of icntl 14,
>>> which fails, and then we update icntl 14 to 30 and solve again. The second
>>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to
>>> succeed in this case, but for some reason the second solve still fails.
>>>
>>> Below I've also pasted the output from -ksp_view, and you can see that
>>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
>>> output), so it's not clear to me why the second solve fails. It seems like
>>> MUMPS is ignoring the update to the ictnl value?
>>>
>>
>> I believe this parameter is utilized during the numerical factorization
>> phase.
>> In your code, the operator hasn't changed, however you haven't signalled
>> to the KSP that you want to re-perform the numerical factorization.
>> You can do this by calling KSPSetOperators() before your second solve.
>> I think if you do this (please try it), the factorization will be
>> performed again and the new value of icntl will have an effect.
>>
>> Note this is a wild stab in the dark - I haven't dug through the
>> petsc-mumps code in detail...
>>
>
> That sounds like a plausible guess to me, but unfortunately it didn't
> work. I added KSPSetOperators(ksp,A,A); before the second solve and I got
> the same behavior as before.
>
> Thanks,
> David
>
>
>
>
>
>> ------------------------------------------------------------
>>> -----------------------------------------
>>> Test code:
>>>
>>>   Mat A;
>>>   MatCreate(PETSC_COMM_WORLD,&A);
>>>   MatSetType(A,MATMPIAIJ);
>>>
>>>   PetscViewer petsc_viewer;
>>>   PetscViewerBinaryOpen( PETSC_COMM_WORLD,
>>>                          "matrix.dat",
>>>                          FILE_MODE_READ,
>>>                          &petsc_viewer);
>>>   MatLoad(A, petsc_viewer);
>>>   PetscViewerDestroy(&petsc_viewer);
>>>
>>>   PetscInt m, n;
>>>   MatGetSize(A, &m, &n);
>>>
>>>   Vec x;
>>>   VecCreate(PETSC_COMM_WORLD,&x);
>>>   VecSetSizes(x,PETSC_DECIDE,m);
>>>   VecSetFromOptions(x);
>>>   VecSet(x,1.0);
>>>
>>>   Vec b;
>>>   VecDuplicate(x,&b);
>>>
>>>   KSP ksp;
>>>   PC pc;
>>>
>>>   KSPCreate(PETSC_COMM_WORLD,&ksp);
>>>   KSPSetOperators(ksp,A,A);
>>>
>>>   KSPSetType(ksp,KSPPREONLY);
>>>   KSPGetPC(ksp,&pc);
>>>
>>>   PCSetType(pc,PCCHOLESKY);
>>>
>>>   PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
>>>   PCFactorSetUpMatSolverPackage(pc);
>>>
>>>   KSPSetFromOptions(ksp);
>>>   KSPSetUp(ksp);
>>>
>>>   KSPSolve(ksp,b,x);
>>>
>>>   {
>>>     KSPConvergedReason reason;
>>>     KSPGetConvergedReason(ksp, &reason);
>>>     std::cout << "converged reason: " << reason << std::endl;
>>>   }
>>>
>>>   Mat F;
>>>   PCFactorGetMatrix(pc,&F);
>>>   MatMumpsSetIcntl(F,14,30);
>>>
>>>   KSPSolve(ksp,b,x);
>>>
>>>   {
>>>     KSPConvergedReason reason;
>>>     KSPGetConvergedReason(ksp, &reason);
>>>     std::cout << "converged reason: " << reason << std::endl;
>>>   }
>>>
>>> ------------------------------------------------------------
>>> -----------------------------------------
>>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged
>>> reason: -11" for both solves)
>>>
>>> KSP Object: 4 MPI processes
>>>   type: preonly
>>>   maximum iterations=10000, initial guess is zero
>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>   left preconditioning
>>>   using NONE norm type for convergence test
>>> PC Object: 4 MPI processes
>>>   type: cholesky
>>>     Cholesky: out-of-place factorization
>>>     tolerance for zero pivot 2.22045e-14
>>>     matrix ordering: natural
>>>     factor fill ratio given 0., needed 0.
>>>       Factored matrix follows:
>>>         Mat Object:         4 MPI processes
>>>           type: mpiaij
>>>           rows=22878, cols=22878
>>>           package used to perform factorization: mumps
>>>           total: nonzeros=3361617, allocated nonzeros=3361617
>>>           total number of mallocs used during MatSetValues calls =0
>>>             MUMPS run parameters:
>>>               SYM (matrix type):                   2
>>>               PAR (host participation):            1
>>>               ICNTL(1) (output for error):         6
>>>               ICNTL(2) (output of diagnostic msg): 0
>>>               ICNTL(3) (output for global info):   0
>>>               ICNTL(4) (level of printing):        0
>>>               ICNTL(5) (input mat struct):         0
>>>               ICNTL(6) (matrix prescaling):        7
>>>               ICNTL(7) (sequentia matrix ordering):7
>>>               ICNTL(8) (scalling strategy):        77
>>>               ICNTL(10) (max num of refinements):  0
>>>               ICNTL(11) (error analysis):          0
>>>               ICNTL(12) (efficiency control):                         0
>>>               ICNTL(13) (efficiency control):                         0
>>>               ICNTL(14) (percentage of estimated workspace increase): 20
>>>               ICNTL(18) (input mat struct):                           3
>>>               ICNTL(19) (Shur complement info):                       0
>>>               ICNTL(20) (rhs sparse pattern):                         0
>>>               ICNTL(21) (solution struct):                            1
>>>               ICNTL(22) (in-core/out-of-core facility):               0
>>>               ICNTL(23) (max size of memory can be allocated locally):0
>>>               ICNTL(24) (detection of null pivot rows):               0
>>>               ICNTL(25) (computation of a null space basis):          0
>>>               ICNTL(26) (Schur options for rhs or solution):          0
>>>               ICNTL(27) (experimental parameter):
>>> -24
>>>               ICNTL(28) (use parallel or sequential ordering):        1
>>>               ICNTL(29) (parallel ordering):                          0
>>>               ICNTL(30) (user-specified set of entries in inv(A)):    0
>>>               ICNTL(31) (factors is discarded in the solve phase):    0
>>>               ICNTL(33) (compute determinant):                        0
>>>               CNTL(1) (relative pivoting threshold):      0.01
>>>               CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>               CNTL(3) (absolute pivoting threshold):      0.
>>>               CNTL(4) (value of static pivoting):         -1.
>>>               CNTL(5) (fixation for null pivots):         0.
>>>               RINFO(1) (local estimated flops for the elimination after
>>> analysis):
>>>                 [0] 1.84947e+08
>>>                 [1] 2.42065e+08
>>>                 [2] 2.53044e+08
>>>                 [3] 2.18441e+08
>>>               RINFO(2) (local estimated flops for the assembly after
>>> factorization):
>>>                 [0]  945938.
>>>                 [1]  906795.
>>>                 [2]  897815.
>>>                 [3]  998840.
>>>               RINFO(3) (local estimated flops for the elimination after
>>> factorization):
>>>                 [0]  1.59835e+08
>>>                 [1]  1.50867e+08
>>>                 [2]  2.27932e+08
>>>                 [3]  1.52037e+08
>>>               INFO(15) (estimated size of (in MB) MUMPS internal data
>>> for running numerical factorization):
>>>               [0] 36
>>>               [1] 37
>>>               [2] 38
>>>               [3] 39
>>>               INFO(16) (size of (in MB) MUMPS internal data used during
>>> numerical factorization):
>>>                 [0] 36
>>>                 [1] 37
>>>                 [2] 38
>>>                 [3] 39
>>>               INFO(23) (num of pivots eliminated on this processor after
>>> factorization):
>>>                 [0] 6450
>>>                 [1] 5442
>>>                 [2] 4386
>>>                 [3] 5526
>>>               RINFOG(1) (global estimated flops for the elimination
>>> after analysis): 8.98497e+08
>>>               RINFOG(2) (global estimated flops for the assembly after
>>> factorization): 3.74939e+06
>>>               RINFOG(3) (global estimated flops for the elimination
>>> after factorization): 6.9067e+08
>>>               (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>> (0.,0.)*(2^0)
>>>               INFOG(3) (estimated real workspace for factors on all
>>> processors after analysis): 4082184
>>>               INFOG(4) (estimated integer workspace for factors on all
>>> processors after analysis): 231846
>>>               INFOG(5) (estimated maximum front size in the complete
>>> tree): 678
>>>               INFOG(6) (number of nodes in the complete tree): 1380
>>>               INFOG(7) (ordering option effectively use after analysis):
>>> 5
>>>               INFOG(8) (structural symmetry in percent of the permuted
>>> matrix after analysis): 100
>>>               INFOG(9) (total real/complex workspace to store the matrix
>>> factors after factorization): 3521904
>>>               INFOG(10) (total integer space store the matrix factors
>>> after factorization): 229416
>>>               INFOG(11) (order of largest frontal matrix after
>>> factorization): 678
>>>               INFOG(12) (number of off-diagonal pivots): 0
>>>               INFOG(13) (number of delayed pivots after factorization):
>>> 0
>>>               INFOG(14) (number of memory compress after factorization):
>>> 0
>>>               INFOG(15) (number of steps of iterative refinement after
>>> solution): 0
>>>               INFOG(16) (estimated size (in MB) of all MUMPS internal
>>> data for factorization after analysis: value on the most memory consuming
>>> processor): 39
>>>               INFOG(17) (estimated size of all MUMPS internal data for
>>> factorization after analysis: sum over all processors): 150
>>>               INFOG(18) (size of all MUMPS internal data allocated
>>> during factorization: value on the most memory consuming processor): 39
>>>               INFOG(19) (size of all MUMPS internal data allocated
>>> during factorization: sum over all processors): 150
>>>               INFOG(20) (estimated number of entries in the factors):
>>> 3361617
>>>               INFOG(21) (size in MB of memory effectively used during
>>> factorization - value on the most memory consuming processor): 35
>>>               INFOG(22) (size in MB of memory effectively used during
>>> factorization - sum over all processors): 136
>>>               INFOG(23) (after analysis: value of ICNTL(6) effectively
>>> used): 0
>>>               INFOG(24) (after analysis: value of ICNTL(12) effectively
>>> used): 1
>>>               INFOG(25) (after factorization: number of pivots modified
>>> by static pivoting): 0
>>>               INFOG(28) (after factorization: number of null pivots
>>> encountered): 0
>>>               INFOG(29) (after factorization: effective number of
>>> entries in the factors (sum over all processors)): 2931438
>>>               INFOG(30, 31) (after solution: size in Mbytes of memory
>>> used during solution phase): 0, 0
>>>               INFOG(32) (after analysis: type of analysis done): 1
>>>               INFOG(33) (value used for ICNTL(8)): 7
>>>               INFOG(34) (exponent of the determinant if determinant is
>>> requested): 0
>>>   linear system matrix = precond matrix:
>>>   Mat Object:   4 MPI processes
>>>     type: mpiaij
>>>     rows=22878, cols=22878
>>>     total: nonzeros=1219140, allocated nonzeros=1219140
>>>     total number of mallocs used during MatSetValues calls =0
>>>       using I-node (on process 0) routines: found 1889 nodes, limit used
>>> is 5
>>> converged reason: -11
>>> KSP Object: 4 MPI processes
>>>   type: preonly
>>>   maximum iterations=10000, initial guess is zero
>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>   left preconditioning
>>>   using NONE norm type for convergence test
>>> PC Object: 4 MPI processes
>>>   type: cholesky
>>>     Cholesky: out-of-place factorization
>>>     tolerance for zero pivot 2.22045e-14
>>>     matrix ordering: natural
>>>     factor fill ratio given 0., needed 0.
>>>       Factored matrix follows:
>>>         Mat Object:         4 MPI processes
>>>           type: mpiaij
>>>           rows=22878, cols=22878
>>>           package used to perform factorization: mumps
>>>           total: nonzeros=3361617, allocated nonzeros=3361617
>>>           total number of mallocs used during MatSetValues calls =0
>>>             MUMPS run parameters:
>>>               SYM (matrix type):                   2
>>>               PAR (host participation):            1
>>>               ICNTL(1) (output for error):         6
>>>               ICNTL(2) (output of diagnostic msg): 0
>>>               ICNTL(3) (output for global info):   0
>>>               ICNTL(4) (level of printing):        0
>>>               ICNTL(5) (input mat struct):         0
>>>               ICNTL(6) (matrix prescaling):        7
>>>               ICNTL(7) (sequentia matrix ordering):7
>>>               ICNTL(8) (scalling strategy):        77
>>>               ICNTL(10) (max num of refinements):  0
>>>               ICNTL(11) (error analysis):          0
>>>               ICNTL(12) (efficiency control):                         0
>>>               ICNTL(13) (efficiency control):                         0
>>>               ICNTL(14) (percentage of estimated workspace increase): 30
>>>               ICNTL(18) (input mat struct):                           3
>>>               ICNTL(19) (Shur complement info):                       0
>>>               ICNTL(20) (rhs sparse pattern):                         0
>>>               ICNTL(21) (solution struct):                            1
>>>               ICNTL(22) (in-core/out-of-core facility):               0
>>>               ICNTL(23) (max size of memory can be allocated locally):0
>>>               ICNTL(24) (detection of null pivot rows):               0
>>>               ICNTL(25) (computation of a null space basis):          0
>>>               ICNTL(26) (Schur options for rhs or solution):          0
>>>               ICNTL(27) (experimental parameter):
>>> -24
>>>               ICNTL(28) (use parallel or sequential ordering):        1
>>>               ICNTL(29) (parallel ordering):                          0
>>>               ICNTL(30) (user-specified set of entries in inv(A)):    0
>>>               ICNTL(31) (factors is discarded in the solve phase):    0
>>>               ICNTL(33) (compute determinant):                        0
>>>               CNTL(1) (relative pivoting threshold):      0.01
>>>               CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>               CNTL(3) (absolute pivoting threshold):      0.
>>>               CNTL(4) (value of static pivoting):         -1.
>>>               CNTL(5) (fixation for null pivots):         0.
>>>               RINFO(1) (local estimated flops for the elimination after
>>> analysis):
>>>                 [0] 1.84947e+08
>>>                 [1] 2.42065e+08
>>>                 [2] 2.53044e+08
>>>                 [3] 2.18441e+08
>>>               RINFO(2) (local estimated flops for the assembly after
>>> factorization):
>>>                 [0]  945938.
>>>                 [1]  906795.
>>>                 [2]  897815.
>>>                 [3]  998840.
>>>               RINFO(3) (local estimated flops for the elimination after
>>> factorization):
>>>                 [0]  1.59835e+08
>>>                 [1]  1.50867e+08
>>>                 [2]  2.27932e+08
>>>                 [3]  1.52037e+08
>>>               INFO(15) (estimated size of (in MB) MUMPS internal data
>>> for running numerical factorization):
>>>               [0] 36
>>>               [1] 37
>>>               [2] 38
>>>               [3] 39
>>>               INFO(16) (size of (in MB) MUMPS internal data used during
>>> numerical factorization):
>>>                 [0] 36
>>>                 [1] 37
>>>                 [2] 38
>>>                 [3] 39
>>>               INFO(23) (num of pivots eliminated on this processor after
>>> factorization):
>>>                 [0] 6450
>>>                 [1] 5442
>>>                 [2] 4386
>>>                 [3] 5526
>>>               RINFOG(1) (global estimated flops for the elimination
>>> after analysis): 8.98497e+08
>>>               RINFOG(2) (global estimated flops for the assembly after
>>> factorization): 3.74939e+06
>>>               RINFOG(3) (global estimated flops for the elimination
>>> after factorization): 6.9067e+08
>>>               (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>> (0.,0.)*(2^0)
>>>               INFOG(3) (estimated real workspace for factors on all
>>> processors after analysis): 4082184
>>>               INFOG(4) (estimated integer workspace for factors on all
>>> processors after analysis): 231846
>>>               INFOG(5) (estimated maximum front size in the complete
>>> tree): 678
>>>               INFOG(6) (number of nodes in the complete tree): 1380
>>>               INFOG(7) (ordering option effectively use after analysis):
>>> 5
>>>               INFOG(8) (structural symmetry in percent of the permuted
>>> matrix after analysis): 100
>>>               INFOG(9) (total real/complex workspace to store the matrix
>>> factors after factorization): 3521904
>>>               INFOG(10) (total integer space store the matrix factors
>>> after factorization): 229416
>>>               INFOG(11) (order of largest frontal matrix after
>>> factorization): 678
>>>               INFOG(12) (number of off-diagonal pivots): 0
>>>               INFOG(13) (number of delayed pivots after factorization):
>>> 0
>>>               INFOG(14) (number of memory compress after factorization):
>>> 0
>>>               INFOG(15) (number of steps of iterative refinement after
>>> solution): 0
>>>               INFOG(16) (estimated size (in MB) of all MUMPS internal
>>> data for factorization after analysis: value on the most memory consuming
>>> processor): 39
>>>               INFOG(17) (estimated size of all MUMPS internal data for
>>> factorization after analysis: sum over all processors): 150
>>>               INFOG(18) (size of all MUMPS internal data allocated
>>> during factorization: value on the most memory consuming processor): 39
>>>               INFOG(19) (size of all MUMPS internal data allocated
>>> during factorization: sum over all processors): 150
>>>               INFOG(20) (estimated number of entries in the factors):
>>> 3361617
>>>               INFOG(21) (size in MB of memory effectively used during
>>> factorization - value on the most memory consuming processor): 35
>>>               INFOG(22) (size in MB of memory effectively used during
>>> factorization - sum over all processors): 136
>>>               INFOG(23) (after analysis: value of ICNTL(6) effectively
>>> used): 0
>>>               INFOG(24) (after analysis: value of ICNTL(12) effectively
>>> used): 1
>>>               INFOG(25) (after factorization: number of pivots modified
>>> by static pivoting): 0
>>>               INFOG(28) (after factorization: number of null pivots
>>> encountered): 0
>>>               INFOG(29) (after factorization: effective number of
>>> entries in the factors (sum over all processors)): 2931438
>>>               INFOG(30, 31) (after solution: size in Mbytes of memory
>>> used during solution phase): 0, 0
>>>               INFOG(32) (after analysis: type of analysis done): 1
>>>               INFOG(33) (value used for ICNTL(8)): 7
>>>               INFOG(34) (exponent of the determinant if determinant is
>>> requested): 0
>>>   linear system matrix = precond matrix:
>>>   Mat Object:   4 MPI processes
>>>     type: mpiaij
>>>     rows=22878, cols=22878
>>>     total: nonzeros=1219140, allocated nonzeros=1219140
>>>     total number of mallocs used during MatSetValues calls =0
>>>       using I-node (on process 0) routines: found 1889 nodes, limit used
>>> is 5
>>> converged reason: -11
>>>
>>> ------------------------------------------------------------
>>> -----------------------------------------
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160919/fc8ae8fa/attachment-0001.html>