[petsc-users] Issue updating MUMPS ictnl after failed solve
Fande Kong
fdkong.jd at gmail.com
Mon Sep 19 20:45:38 CDT 2016
Placing PCReset(PC pc) before the second kspsolve might works.
Fande Kong,
On Mon, Sep 19, 2016 at 7:38 PM, murat keçeli <keceli at gmail.com> wrote:
> Another guess: maybe you also need KSPSetUp(ksp); before the second
> KSPSolve(ksp,b,x);.
>
> Murat Keceli
>
>
> On Mon, Sep 19, 2016 at 8:33 PM, David Knezevic <
> david.knezevic at akselos.com> wrote:
>
>> On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhem23 at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On 19 September 2016 at 21:05, David Knezevic <
>>> david.knezevic at akselos.com> wrote:
>>>
>>>> When I use MUMPS via PETSc, one issue is that it can sometimes fail
>>>> with MUMPS error -9, which means that MUMPS didn't allocate a big enough
>>>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
>>>> via the command line option -mat_mumps_icntl_14.
>>>>
>>>> However, instead of having to run several times with different command
>>>> line options, I'd like to be able to automatically increment icntl 14 value
>>>> in a loop until the solve succeeds.
>>>>
>>>> I have a saved matrix which fails when I use it for a solve with MUMPS
>>>> with 4 MPI processes and the default ictnl values, so I'm using this to
>>>> check that I can achieve the automatic icntl 14 update, as described above.
>>>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send
>>>> it to anyone else who wants to try this test case out.)
>>>>
>>>> I've pasted some test code below which provides a simple test of this
>>>> idea using two solves. The first solve uses the default value of icntl 14,
>>>> which fails, and then we update icntl 14 to 30 and solve again. The second
>>>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to
>>>> succeed in this case, but for some reason the second solve still fails.
>>>>
>>>> Below I've also pasted the output from -ksp_view, and you can see that
>>>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
>>>> output), so it's not clear to me why the second solve fails. It seems like
>>>> MUMPS is ignoring the update to the ictnl value?
>>>>
>>>
>>> I believe this parameter is utilized during the numerical factorization
>>> phase.
>>> In your code, the operator hasn't changed, however you haven't signalled
>>> to the KSP that you want to re-perform the numerical factorization.
>>> You can do this by calling KSPSetOperators() before your second solve.
>>> I think if you do this (please try it), the factorization will be
>>> performed again and the new value of icntl will have an effect.
>>>
>>> Note this is a wild stab in the dark - I haven't dug through the
>>> petsc-mumps code in detail...
>>>
>>
>> That sounds like a plausible guess to me, but unfortunately it didn't
>> work. I added KSPSetOperators(ksp,A,A); before the second solve and I
>> got the same behavior as before.
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
>>> ------------------------------------------------------------
>>>> -----------------------------------------
>>>> Test code:
>>>>
>>>> Mat A;
>>>> MatCreate(PETSC_COMM_WORLD,&A);
>>>> MatSetType(A,MATMPIAIJ);
>>>>
>>>> PetscViewer petsc_viewer;
>>>> PetscViewerBinaryOpen( PETSC_COMM_WORLD,
>>>> "matrix.dat",
>>>> FILE_MODE_READ,
>>>> &petsc_viewer);
>>>> MatLoad(A, petsc_viewer);
>>>> PetscViewerDestroy(&petsc_viewer);
>>>>
>>>> PetscInt m, n;
>>>> MatGetSize(A, &m, &n);
>>>>
>>>> Vec x;
>>>> VecCreate(PETSC_COMM_WORLD,&x);
>>>> VecSetSizes(x,PETSC_DECIDE,m);
>>>> VecSetFromOptions(x);
>>>> VecSet(x,1.0);
>>>>
>>>> Vec b;
>>>> VecDuplicate(x,&b);
>>>>
>>>> KSP ksp;
>>>> PC pc;
>>>>
>>>> KSPCreate(PETSC_COMM_WORLD,&ksp);
>>>> KSPSetOperators(ksp,A,A);
>>>>
>>>> KSPSetType(ksp,KSPPREONLY);
>>>> KSPGetPC(ksp,&pc);
>>>>
>>>> PCSetType(pc,PCCHOLESKY);
>>>>
>>>> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
>>>> PCFactorSetUpMatSolverPackage(pc);
>>>>
>>>> KSPSetFromOptions(ksp);
>>>> KSPSetUp(ksp);
>>>>
>>>> KSPSolve(ksp,b,x);
>>>>
>>>> {
>>>> KSPConvergedReason reason;
>>>> KSPGetConvergedReason(ksp, &reason);
>>>> std::cout << "converged reason: " << reason << std::endl;
>>>> }
>>>>
>>>> Mat F;
>>>> PCFactorGetMatrix(pc,&F);
>>>> MatMumpsSetIcntl(F,14,30);
>>>>
>>>> KSPSolve(ksp,b,x);
>>>>
>>>> {
>>>> KSPConvergedReason reason;
>>>> KSPGetConvergedReason(ksp, &reason);
>>>> std::cout << "converged reason: " << reason << std::endl;
>>>> }
>>>>
>>>> ------------------------------------------------------------
>>>> -----------------------------------------
>>>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get
>>>> "converged reason: -11" for both solves)
>>>>
>>>> KSP Object: 4 MPI processes
>>>> type: preonly
>>>> maximum iterations=10000, initial guess is zero
>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>> left preconditioning
>>>> using NONE norm type for convergence test
>>>> PC Object: 4 MPI processes
>>>> type: cholesky
>>>> Cholesky: out-of-place factorization
>>>> tolerance for zero pivot 2.22045e-14
>>>> matrix ordering: natural
>>>> factor fill ratio given 0., needed 0.
>>>> Factored matrix follows:
>>>> Mat Object: 4 MPI processes
>>>> type: mpiaij
>>>> rows=22878, cols=22878
>>>> package used to perform factorization: mumps
>>>> total: nonzeros=3361617, allocated nonzeros=3361617
>>>> total number of mallocs used during MatSetValues calls =0
>>>> MUMPS run parameters:
>>>> SYM (matrix type): 2
>>>> PAR (host participation): 1
>>>> ICNTL(1) (output for error): 6
>>>> ICNTL(2) (output of diagnostic msg): 0
>>>> ICNTL(3) (output for global info): 0
>>>> ICNTL(4) (level of printing): 0
>>>> ICNTL(5) (input mat struct): 0
>>>> ICNTL(6) (matrix prescaling): 7
>>>> ICNTL(7) (sequentia matrix ordering):7
>>>> ICNTL(8) (scalling strategy): 77
>>>> ICNTL(10) (max num of refinements): 0
>>>> ICNTL(11) (error analysis): 0
>>>> ICNTL(12) (efficiency control): 0
>>>> ICNTL(13) (efficiency control): 0
>>>> ICNTL(14) (percentage of estimated workspace increase):
>>>> 20
>>>> ICNTL(18) (input mat struct): 3
>>>> ICNTL(19) (Shur complement info): 0
>>>> ICNTL(20) (rhs sparse pattern): 0
>>>> ICNTL(21) (solution struct): 1
>>>> ICNTL(22) (in-core/out-of-core facility): 0
>>>> ICNTL(23) (max size of memory can be allocated locally):0
>>>> ICNTL(24) (detection of null pivot rows): 0
>>>> ICNTL(25) (computation of a null space basis): 0
>>>> ICNTL(26) (Schur options for rhs or solution): 0
>>>> ICNTL(27) (experimental parameter):
>>>> -24
>>>> ICNTL(28) (use parallel or sequential ordering): 1
>>>> ICNTL(29) (parallel ordering): 0
>>>> ICNTL(30) (user-specified set of entries in inv(A)): 0
>>>> ICNTL(31) (factors is discarded in the solve phase): 0
>>>> ICNTL(33) (compute determinant): 0
>>>> CNTL(1) (relative pivoting threshold): 0.01
>>>> CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>> CNTL(3) (absolute pivoting threshold): 0.
>>>> CNTL(4) (value of static pivoting): -1.
>>>> CNTL(5) (fixation for null pivots): 0.
>>>> RINFO(1) (local estimated flops for the elimination after
>>>> analysis):
>>>> [0] 1.84947e+08
>>>> [1] 2.42065e+08
>>>> [2] 2.53044e+08
>>>> [3] 2.18441e+08
>>>> RINFO(2) (local estimated flops for the assembly after
>>>> factorization):
>>>> [0] 945938.
>>>> [1] 906795.
>>>> [2] 897815.
>>>> [3] 998840.
>>>> RINFO(3) (local estimated flops for the elimination after
>>>> factorization):
>>>> [0] 1.59835e+08
>>>> [1] 1.50867e+08
>>>> [2] 2.27932e+08
>>>> [3] 1.52037e+08
>>>> INFO(15) (estimated size of (in MB) MUMPS internal data
>>>> for running numerical factorization):
>>>> [0] 36
>>>> [1] 37
>>>> [2] 38
>>>> [3] 39
>>>> INFO(16) (size of (in MB) MUMPS internal data used during
>>>> numerical factorization):
>>>> [0] 36
>>>> [1] 37
>>>> [2] 38
>>>> [3] 39
>>>> INFO(23) (num of pivots eliminated on this processor
>>>> after factorization):
>>>> [0] 6450
>>>> [1] 5442
>>>> [2] 4386
>>>> [3] 5526
>>>> RINFOG(1) (global estimated flops for the elimination
>>>> after analysis): 8.98497e+08
>>>> RINFOG(2) (global estimated flops for the assembly after
>>>> factorization): 3.74939e+06
>>>> RINFOG(3) (global estimated flops for the elimination
>>>> after factorization): 6.9067e+08
>>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>>> (0.,0.)*(2^0)
>>>> INFOG(3) (estimated real workspace for factors on all
>>>> processors after analysis): 4082184
>>>> INFOG(4) (estimated integer workspace for factors on all
>>>> processors after analysis): 231846
>>>> INFOG(5) (estimated maximum front size in the complete
>>>> tree): 678
>>>> INFOG(6) (number of nodes in the complete tree): 1380
>>>> INFOG(7) (ordering option effectively use after
>>>> analysis): 5
>>>> INFOG(8) (structural symmetry in percent of the permuted
>>>> matrix after analysis): 100
>>>> INFOG(9) (total real/complex workspace to store the
>>>> matrix factors after factorization): 3521904
>>>> INFOG(10) (total integer space store the matrix factors
>>>> after factorization): 229416
>>>> INFOG(11) (order of largest frontal matrix after
>>>> factorization): 678
>>>> INFOG(12) (number of off-diagonal pivots): 0
>>>> INFOG(13) (number of delayed pivots after factorization):
>>>> 0
>>>> INFOG(14) (number of memory compress after
>>>> factorization): 0
>>>> INFOG(15) (number of steps of iterative refinement after
>>>> solution): 0
>>>> INFOG(16) (estimated size (in MB) of all MUMPS internal
>>>> data for factorization after analysis: value on the most memory consuming
>>>> processor): 39
>>>> INFOG(17) (estimated size of all MUMPS internal data for
>>>> factorization after analysis: sum over all processors): 150
>>>> INFOG(18) (size of all MUMPS internal data allocated
>>>> during factorization: value on the most memory consuming processor): 39
>>>> INFOG(19) (size of all MUMPS internal data allocated
>>>> during factorization: sum over all processors): 150
>>>> INFOG(20) (estimated number of entries in the factors):
>>>> 3361617
>>>> INFOG(21) (size in MB of memory effectively used during
>>>> factorization - value on the most memory consuming processor): 35
>>>> INFOG(22) (size in MB of memory effectively used during
>>>> factorization - sum over all processors): 136
>>>> INFOG(23) (after analysis: value of ICNTL(6) effectively
>>>> used): 0
>>>> INFOG(24) (after analysis: value of ICNTL(12) effectively
>>>> used): 1
>>>> INFOG(25) (after factorization: number of pivots modified
>>>> by static pivoting): 0
>>>> INFOG(28) (after factorization: number of null pivots
>>>> encountered): 0
>>>> INFOG(29) (after factorization: effective number of
>>>> entries in the factors (sum over all processors)): 2931438
>>>> INFOG(30, 31) (after solution: size in Mbytes of memory
>>>> used during solution phase): 0, 0
>>>> INFOG(32) (after analysis: type of analysis done): 1
>>>> INFOG(33) (value used for ICNTL(8)): 7
>>>> INFOG(34) (exponent of the determinant if determinant is
>>>> requested): 0
>>>> linear system matrix = precond matrix:
>>>> Mat Object: 4 MPI processes
>>>> type: mpiaij
>>>> rows=22878, cols=22878
>>>> total: nonzeros=1219140, allocated nonzeros=1219140
>>>> total number of mallocs used during MatSetValues calls =0
>>>> using I-node (on process 0) routines: found 1889 nodes, limit
>>>> used is 5
>>>> converged reason: -11
>>>> KSP Object: 4 MPI processes
>>>> type: preonly
>>>> maximum iterations=10000, initial guess is zero
>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>> left preconditioning
>>>> using NONE norm type for convergence test
>>>> PC Object: 4 MPI processes
>>>> type: cholesky
>>>> Cholesky: out-of-place factorization
>>>> tolerance for zero pivot 2.22045e-14
>>>> matrix ordering: natural
>>>> factor fill ratio given 0., needed 0.
>>>> Factored matrix follows:
>>>> Mat Object: 4 MPI processes
>>>> type: mpiaij
>>>> rows=22878, cols=22878
>>>> package used to perform factorization: mumps
>>>> total: nonzeros=3361617, allocated nonzeros=3361617
>>>> total number of mallocs used during MatSetValues calls =0
>>>> MUMPS run parameters:
>>>> SYM (matrix type): 2
>>>> PAR (host participation): 1
>>>> ICNTL(1) (output for error): 6
>>>> ICNTL(2) (output of diagnostic msg): 0
>>>> ICNTL(3) (output for global info): 0
>>>> ICNTL(4) (level of printing): 0
>>>> ICNTL(5) (input mat struct): 0
>>>> ICNTL(6) (matrix prescaling): 7
>>>> ICNTL(7) (sequentia matrix ordering):7
>>>> ICNTL(8) (scalling strategy): 77
>>>> ICNTL(10) (max num of refinements): 0
>>>> ICNTL(11) (error analysis): 0
>>>> ICNTL(12) (efficiency control): 0
>>>> ICNTL(13) (efficiency control): 0
>>>> ICNTL(14) (percentage of estimated workspace increase):
>>>> 30
>>>> ICNTL(18) (input mat struct): 3
>>>> ICNTL(19) (Shur complement info): 0
>>>> ICNTL(20) (rhs sparse pattern): 0
>>>> ICNTL(21) (solution struct): 1
>>>> ICNTL(22) (in-core/out-of-core facility): 0
>>>> ICNTL(23) (max size of memory can be allocated locally):0
>>>> ICNTL(24) (detection of null pivot rows): 0
>>>> ICNTL(25) (computation of a null space basis): 0
>>>> ICNTL(26) (Schur options for rhs or solution): 0
>>>> ICNTL(27) (experimental parameter):
>>>> -24
>>>> ICNTL(28) (use parallel or sequential ordering): 1
>>>> ICNTL(29) (parallel ordering): 0
>>>> ICNTL(30) (user-specified set of entries in inv(A)): 0
>>>> ICNTL(31) (factors is discarded in the solve phase): 0
>>>> ICNTL(33) (compute determinant): 0
>>>> CNTL(1) (relative pivoting threshold): 0.01
>>>> CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>> CNTL(3) (absolute pivoting threshold): 0.
>>>> CNTL(4) (value of static pivoting): -1.
>>>> CNTL(5) (fixation for null pivots): 0.
>>>> RINFO(1) (local estimated flops for the elimination after
>>>> analysis):
>>>> [0] 1.84947e+08
>>>> [1] 2.42065e+08
>>>> [2] 2.53044e+08
>>>> [3] 2.18441e+08
>>>> RINFO(2) (local estimated flops for the assembly after
>>>> factorization):
>>>> [0] 945938.
>>>> [1] 906795.
>>>> [2] 897815.
>>>> [3] 998840.
>>>> RINFO(3) (local estimated flops for the elimination after
>>>> factorization):
>>>> [0] 1.59835e+08
>>>> [1] 1.50867e+08
>>>> [2] 2.27932e+08
>>>> [3] 1.52037e+08
>>>> INFO(15) (estimated size of (in MB) MUMPS internal data
>>>> for running numerical factorization):
>>>> [0] 36
>>>> [1] 37
>>>> [2] 38
>>>> [3] 39
>>>> INFO(16) (size of (in MB) MUMPS internal data used during
>>>> numerical factorization):
>>>> [0] 36
>>>> [1] 37
>>>> [2] 38
>>>> [3] 39
>>>> INFO(23) (num of pivots eliminated on this processor
>>>> after factorization):
>>>> [0] 6450
>>>> [1] 5442
>>>> [2] 4386
>>>> [3] 5526
>>>> RINFOG(1) (global estimated flops for the elimination
>>>> after analysis): 8.98497e+08
>>>> RINFOG(2) (global estimated flops for the assembly after
>>>> factorization): 3.74939e+06
>>>> RINFOG(3) (global estimated flops for the elimination
>>>> after factorization): 6.9067e+08
>>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>>> (0.,0.)*(2^0)
>>>> INFOG(3) (estimated real workspace for factors on all
>>>> processors after analysis): 4082184
>>>> INFOG(4) (estimated integer workspace for factors on all
>>>> processors after analysis): 231846
>>>> INFOG(5) (estimated maximum front size in the complete
>>>> tree): 678
>>>> INFOG(6) (number of nodes in the complete tree): 1380
>>>> INFOG(7) (ordering option effectively use after
>>>> analysis): 5
>>>> INFOG(8) (structural symmetry in percent of the permuted
>>>> matrix after analysis): 100
>>>> INFOG(9) (total real/complex workspace to store the
>>>> matrix factors after factorization): 3521904
>>>> INFOG(10) (total integer space store the matrix factors
>>>> after factorization): 229416
>>>> INFOG(11) (order of largest frontal matrix after
>>>> factorization): 678
>>>> INFOG(12) (number of off-diagonal pivots): 0
>>>> INFOG(13) (number of delayed pivots after factorization):
>>>> 0
>>>> INFOG(14) (number of memory compress after
>>>> factorization): 0
>>>> INFOG(15) (number of steps of iterative refinement after
>>>> solution): 0
>>>> INFOG(16) (estimated size (in MB) of all MUMPS internal
>>>> data for factorization after analysis: value on the most memory consuming
>>>> processor): 39
>>>> INFOG(17) (estimated size of all MUMPS internal data for
>>>> factorization after analysis: sum over all processors): 150
>>>> INFOG(18) (size of all MUMPS internal data allocated
>>>> during factorization: value on the most memory consuming processor): 39
>>>> INFOG(19) (size of all MUMPS internal data allocated
>>>> during factorization: sum over all processors): 150
>>>> INFOG(20) (estimated number of entries in the factors):
>>>> 3361617
>>>> INFOG(21) (size in MB of memory effectively used during
>>>> factorization - value on the most memory consuming processor): 35
>>>> INFOG(22) (size in MB of memory effectively used during
>>>> factorization - sum over all processors): 136
>>>> INFOG(23) (after analysis: value of ICNTL(6) effectively
>>>> used): 0
>>>> INFOG(24) (after analysis: value of ICNTL(12) effectively
>>>> used): 1
>>>> INFOG(25) (after factorization: number of pivots modified
>>>> by static pivoting): 0
>>>> INFOG(28) (after factorization: number of null pivots
>>>> encountered): 0
>>>> INFOG(29) (after factorization: effective number of
>>>> entries in the factors (sum over all processors)): 2931438
>>>> INFOG(30, 31) (after solution: size in Mbytes of memory
>>>> used during solution phase): 0, 0
>>>> INFOG(32) (after analysis: type of analysis done): 1
>>>> INFOG(33) (value used for ICNTL(8)): 7
>>>> INFOG(34) (exponent of the determinant if determinant is
>>>> requested): 0
>>>> linear system matrix = precond matrix:
>>>> Mat Object: 4 MPI processes
>>>> type: mpiaij
>>>> rows=22878, cols=22878
>>>> total: nonzeros=1219140, allocated nonzeros=1219140
>>>> total number of mallocs used during MatSetValues calls =0
>>>> using I-node (on process 0) routines: found 1889 nodes, limit
>>>> used is 5
>>>> converged reason: -11
>>>>
>>>> ------------------------------------------------------------
>>>> -----------------------------------------
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160919/a3951855/attachment-0001.html>
More information about the petsc-users
mailing list