[petsc-users] Issue updating MUMPS ictnl after failed solve
David Knezevic
david.knezevic at akselos.com
Mon Sep 19 20:33:53 CDT 2016
On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>
> On 19 September 2016 at 21:05, David Knezevic <david.knezevic at akselos.com>
> wrote:
>
>> When I use MUMPS via PETSc, one issue is that it can sometimes fail with
>> MUMPS error -9, which means that MUMPS didn't allocate a big enough
>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
>> via the command line option -mat_mumps_icntl_14.
>>
>> However, instead of having to run several times with different command
>> line options, I'd like to be able to automatically increment icntl 14 value
>> in a loop until the solve succeeds.
>>
>> I have a saved matrix which fails when I use it for a solve with MUMPS
>> with 4 MPI processes and the default ictnl values, so I'm using this to
>> check that I can achieve the automatic icntl 14 update, as described above.
>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send
>> it to anyone else who wants to try this test case out.)
>>
>> I've pasted some test code below which provides a simple test of this
>> idea using two solves. The first solve uses the default value of icntl 14,
>> which fails, and then we update icntl 14 to 30 and solve again. The second
>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to
>> succeed in this case, but for some reason the second solve still fails.
>>
>> Below I've also pasted the output from -ksp_view, and you can see that
>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
>> output), so it's not clear to me why the second solve fails. It seems like
>> MUMPS is ignoring the update to the ictnl value?
>>
>
> I believe this parameter is utilized during the numerical factorization
> phase.
> In your code, the operator hasn't changed, however you haven't signalled
> to the KSP that you want to re-perform the numerical factorization.
> You can do this by calling KSPSetOperators() before your second solve.
> I think if you do this (please try it), the factorization will be
> performed again and the new value of icntl will have an effect.
>
> Note this is a wild stab in the dark - I haven't dug through the
> petsc-mumps code in detail...
>
That sounds like a plausible guess to me, but unfortunately it didn't work.
I added KSPSetOperators(ksp,A,A); before the second solve and I got the
same behavior as before.
Thanks,
David
> ------------------------------------------------------------
>> -----------------------------------------
>> Test code:
>>
>> Mat A;
>> MatCreate(PETSC_COMM_WORLD,&A);
>> MatSetType(A,MATMPIAIJ);
>>
>> PetscViewer petsc_viewer;
>> PetscViewerBinaryOpen( PETSC_COMM_WORLD,
>> "matrix.dat",
>> FILE_MODE_READ,
>> &petsc_viewer);
>> MatLoad(A, petsc_viewer);
>> PetscViewerDestroy(&petsc_viewer);
>>
>> PetscInt m, n;
>> MatGetSize(A, &m, &n);
>>
>> Vec x;
>> VecCreate(PETSC_COMM_WORLD,&x);
>> VecSetSizes(x,PETSC_DECIDE,m);
>> VecSetFromOptions(x);
>> VecSet(x,1.0);
>>
>> Vec b;
>> VecDuplicate(x,&b);
>>
>> KSP ksp;
>> PC pc;
>>
>> KSPCreate(PETSC_COMM_WORLD,&ksp);
>> KSPSetOperators(ksp,A,A);
>>
>> KSPSetType(ksp,KSPPREONLY);
>> KSPGetPC(ksp,&pc);
>>
>> PCSetType(pc,PCCHOLESKY);
>>
>> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
>> PCFactorSetUpMatSolverPackage(pc);
>>
>> KSPSetFromOptions(ksp);
>> KSPSetUp(ksp);
>>
>> KSPSolve(ksp,b,x);
>>
>> {
>> KSPConvergedReason reason;
>> KSPGetConvergedReason(ksp, &reason);
>> std::cout << "converged reason: " << reason << std::endl;
>> }
>>
>> Mat F;
>> PCFactorGetMatrix(pc,&F);
>> MatMumpsSetIcntl(F,14,30);
>>
>> KSPSolve(ksp,b,x);
>>
>> {
>> KSPConvergedReason reason;
>> KSPGetConvergedReason(ksp, &reason);
>> std::cout << "converged reason: " << reason << std::endl;
>> }
>>
>> ------------------------------------------------------------
>> -----------------------------------------
>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged
>> reason: -11" for both solves)
>>
>> KSP Object: 4 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: 4 MPI processes
>> type: cholesky
>> Cholesky: out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> matrix ordering: natural
>> factor fill ratio given 0., needed 0.
>> Factored matrix follows:
>> Mat Object: 4 MPI processes
>> type: mpiaij
>> rows=22878, cols=22878
>> package used to perform factorization: mumps
>> total: nonzeros=3361617, allocated nonzeros=3361617
>> total number of mallocs used during MatSetValues calls =0
>> MUMPS run parameters:
>> SYM (matrix type): 2
>> PAR (host participation): 1
>> ICNTL(1) (output for error): 6
>> ICNTL(2) (output of diagnostic msg): 0
>> ICNTL(3) (output for global info): 0
>> ICNTL(4) (level of printing): 0
>> ICNTL(5) (input mat struct): 0
>> ICNTL(6) (matrix prescaling): 7
>> ICNTL(7) (sequentia matrix ordering):7
>> ICNTL(8) (scalling strategy): 77
>> ICNTL(10) (max num of refinements): 0
>> ICNTL(11) (error analysis): 0
>> ICNTL(12) (efficiency control): 0
>> ICNTL(13) (efficiency control): 0
>> ICNTL(14) (percentage of estimated workspace increase): 20
>> ICNTL(18) (input mat struct): 3
>> ICNTL(19) (Shur complement info): 0
>> ICNTL(20) (rhs sparse pattern): 0
>> ICNTL(21) (solution struct): 1
>> ICNTL(22) (in-core/out-of-core facility): 0
>> ICNTL(23) (max size of memory can be allocated locally):0
>> ICNTL(24) (detection of null pivot rows): 0
>> ICNTL(25) (computation of a null space basis): 0
>> ICNTL(26) (Schur options for rhs or solution): 0
>> ICNTL(27) (experimental parameter): -24
>> ICNTL(28) (use parallel or sequential ordering): 1
>> ICNTL(29) (parallel ordering): 0
>> ICNTL(30) (user-specified set of entries in inv(A)): 0
>> ICNTL(31) (factors is discarded in the solve phase): 0
>> ICNTL(33) (compute determinant): 0
>> CNTL(1) (relative pivoting threshold): 0.01
>> CNTL(2) (stopping criterion of refinement): 1.49012e-08
>> CNTL(3) (absolute pivoting threshold): 0.
>> CNTL(4) (value of static pivoting): -1.
>> CNTL(5) (fixation for null pivots): 0.
>> RINFO(1) (local estimated flops for the elimination after
>> analysis):
>> [0] 1.84947e+08
>> [1] 2.42065e+08
>> [2] 2.53044e+08
>> [3] 2.18441e+08
>> RINFO(2) (local estimated flops for the assembly after
>> factorization):
>> [0] 945938.
>> [1] 906795.
>> [2] 897815.
>> [3] 998840.
>> RINFO(3) (local estimated flops for the elimination after
>> factorization):
>> [0] 1.59835e+08
>> [1] 1.50867e+08
>> [2] 2.27932e+08
>> [3] 1.52037e+08
>> INFO(15) (estimated size of (in MB) MUMPS internal data for
>> running numerical factorization):
>> [0] 36
>> [1] 37
>> [2] 38
>> [3] 39
>> INFO(16) (size of (in MB) MUMPS internal data used during
>> numerical factorization):
>> [0] 36
>> [1] 37
>> [2] 38
>> [3] 39
>> INFO(23) (num of pivots eliminated on this processor after
>> factorization):
>> [0] 6450
>> [1] 5442
>> [2] 4386
>> [3] 5526
>> RINFOG(1) (global estimated flops for the elimination after
>> analysis): 8.98497e+08
>> RINFOG(2) (global estimated flops for the assembly after
>> factorization): 3.74939e+06
>> RINFOG(3) (global estimated flops for the elimination after
>> factorization): 6.9067e+08
>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>> (0.,0.)*(2^0)
>> INFOG(3) (estimated real workspace for factors on all
>> processors after analysis): 4082184
>> INFOG(4) (estimated integer workspace for factors on all
>> processors after analysis): 231846
>> INFOG(5) (estimated maximum front size in the complete
>> tree): 678
>> INFOG(6) (number of nodes in the complete tree): 1380
>> INFOG(7) (ordering option effectively use after analysis):
>> 5
>> INFOG(8) (structural symmetry in percent of the permuted
>> matrix after analysis): 100
>> INFOG(9) (total real/complex workspace to store the matrix
>> factors after factorization): 3521904
>> INFOG(10) (total integer space store the matrix factors
>> after factorization): 229416
>> INFOG(11) (order of largest frontal matrix after
>> factorization): 678
>> INFOG(12) (number of off-diagonal pivots): 0
>> INFOG(13) (number of delayed pivots after factorization): 0
>> INFOG(14) (number of memory compress after factorization):
>> 0
>> INFOG(15) (number of steps of iterative refinement after
>> solution): 0
>> INFOG(16) (estimated size (in MB) of all MUMPS internal
>> data for factorization after analysis: value on the most memory consuming
>> processor): 39
>> INFOG(17) (estimated size of all MUMPS internal data for
>> factorization after analysis: sum over all processors): 150
>> INFOG(18) (size of all MUMPS internal data allocated during
>> factorization: value on the most memory consuming processor): 39
>> INFOG(19) (size of all MUMPS internal data allocated during
>> factorization: sum over all processors): 150
>> INFOG(20) (estimated number of entries in the factors):
>> 3361617
>> INFOG(21) (size in MB of memory effectively used during
>> factorization - value on the most memory consuming processor): 35
>> INFOG(22) (size in MB of memory effectively used during
>> factorization - sum over all processors): 136
>> INFOG(23) (after analysis: value of ICNTL(6) effectively
>> used): 0
>> INFOG(24) (after analysis: value of ICNTL(12) effectively
>> used): 1
>> INFOG(25) (after factorization: number of pivots modified
>> by static pivoting): 0
>> INFOG(28) (after factorization: number of null pivots
>> encountered): 0
>> INFOG(29) (after factorization: effective number of entries
>> in the factors (sum over all processors)): 2931438
>> INFOG(30, 31) (after solution: size in Mbytes of memory
>> used during solution phase): 0, 0
>> INFOG(32) (after analysis: type of analysis done): 1
>> INFOG(33) (value used for ICNTL(8)): 7
>> INFOG(34) (exponent of the determinant if determinant is
>> requested): 0
>> linear system matrix = precond matrix:
>> Mat Object: 4 MPI processes
>> type: mpiaij
>> rows=22878, cols=22878
>> total: nonzeros=1219140, allocated nonzeros=1219140
>> total number of mallocs used during MatSetValues calls =0
>> using I-node (on process 0) routines: found 1889 nodes, limit used
>> is 5
>> converged reason: -11
>> KSP Object: 4 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: 4 MPI processes
>> type: cholesky
>> Cholesky: out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> matrix ordering: natural
>> factor fill ratio given 0., needed 0.
>> Factored matrix follows:
>> Mat Object: 4 MPI processes
>> type: mpiaij
>> rows=22878, cols=22878
>> package used to perform factorization: mumps
>> total: nonzeros=3361617, allocated nonzeros=3361617
>> total number of mallocs used during MatSetValues calls =0
>> MUMPS run parameters:
>> SYM (matrix type): 2
>> PAR (host participation): 1
>> ICNTL(1) (output for error): 6
>> ICNTL(2) (output of diagnostic msg): 0
>> ICNTL(3) (output for global info): 0
>> ICNTL(4) (level of printing): 0
>> ICNTL(5) (input mat struct): 0
>> ICNTL(6) (matrix prescaling): 7
>> ICNTL(7) (sequentia matrix ordering):7
>> ICNTL(8) (scalling strategy): 77
>> ICNTL(10) (max num of refinements): 0
>> ICNTL(11) (error analysis): 0
>> ICNTL(12) (efficiency control): 0
>> ICNTL(13) (efficiency control): 0
>> ICNTL(14) (percentage of estimated workspace increase): 30
>> ICNTL(18) (input mat struct): 3
>> ICNTL(19) (Shur complement info): 0
>> ICNTL(20) (rhs sparse pattern): 0
>> ICNTL(21) (solution struct): 1
>> ICNTL(22) (in-core/out-of-core facility): 0
>> ICNTL(23) (max size of memory can be allocated locally):0
>> ICNTL(24) (detection of null pivot rows): 0
>> ICNTL(25) (computation of a null space basis): 0
>> ICNTL(26) (Schur options for rhs or solution): 0
>> ICNTL(27) (experimental parameter): -24
>> ICNTL(28) (use parallel or sequential ordering): 1
>> ICNTL(29) (parallel ordering): 0
>> ICNTL(30) (user-specified set of entries in inv(A)): 0
>> ICNTL(31) (factors is discarded in the solve phase): 0
>> ICNTL(33) (compute determinant): 0
>> CNTL(1) (relative pivoting threshold): 0.01
>> CNTL(2) (stopping criterion of refinement): 1.49012e-08
>> CNTL(3) (absolute pivoting threshold): 0.
>> CNTL(4) (value of static pivoting): -1.
>> CNTL(5) (fixation for null pivots): 0.
>> RINFO(1) (local estimated flops for the elimination after
>> analysis):
>> [0] 1.84947e+08
>> [1] 2.42065e+08
>> [2] 2.53044e+08
>> [3] 2.18441e+08
>> RINFO(2) (local estimated flops for the assembly after
>> factorization):
>> [0] 945938.
>> [1] 906795.
>> [2] 897815.
>> [3] 998840.
>> RINFO(3) (local estimated flops for the elimination after
>> factorization):
>> [0] 1.59835e+08
>> [1] 1.50867e+08
>> [2] 2.27932e+08
>> [3] 1.52037e+08
>> INFO(15) (estimated size of (in MB) MUMPS internal data for
>> running numerical factorization):
>> [0] 36
>> [1] 37
>> [2] 38
>> [3] 39
>> INFO(16) (size of (in MB) MUMPS internal data used during
>> numerical factorization):
>> [0] 36
>> [1] 37
>> [2] 38
>> [3] 39
>> INFO(23) (num of pivots eliminated on this processor after
>> factorization):
>> [0] 6450
>> [1] 5442
>> [2] 4386
>> [3] 5526
>> RINFOG(1) (global estimated flops for the elimination after
>> analysis): 8.98497e+08
>> RINFOG(2) (global estimated flops for the assembly after
>> factorization): 3.74939e+06
>> RINFOG(3) (global estimated flops for the elimination after
>> factorization): 6.9067e+08
>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>> (0.,0.)*(2^0)
>> INFOG(3) (estimated real workspace for factors on all
>> processors after analysis): 4082184
>> INFOG(4) (estimated integer workspace for factors on all
>> processors after analysis): 231846
>> INFOG(5) (estimated maximum front size in the complete
>> tree): 678
>> INFOG(6) (number of nodes in the complete tree): 1380
>> INFOG(7) (ordering option effectively use after analysis):
>> 5
>> INFOG(8) (structural symmetry in percent of the permuted
>> matrix after analysis): 100
>> INFOG(9) (total real/complex workspace to store the matrix
>> factors after factorization): 3521904
>> INFOG(10) (total integer space store the matrix factors
>> after factorization): 229416
>> INFOG(11) (order of largest frontal matrix after
>> factorization): 678
>> INFOG(12) (number of off-diagonal pivots): 0
>> INFOG(13) (number of delayed pivots after factorization): 0
>> INFOG(14) (number of memory compress after factorization):
>> 0
>> INFOG(15) (number of steps of iterative refinement after
>> solution): 0
>> INFOG(16) (estimated size (in MB) of all MUMPS internal
>> data for factorization after analysis: value on the most memory consuming
>> processor): 39
>> INFOG(17) (estimated size of all MUMPS internal data for
>> factorization after analysis: sum over all processors): 150
>> INFOG(18) (size of all MUMPS internal data allocated during
>> factorization: value on the most memory consuming processor): 39
>> INFOG(19) (size of all MUMPS internal data allocated during
>> factorization: sum over all processors): 150
>> INFOG(20) (estimated number of entries in the factors):
>> 3361617
>> INFOG(21) (size in MB of memory effectively used during
>> factorization - value on the most memory consuming processor): 35
>> INFOG(22) (size in MB of memory effectively used during
>> factorization - sum over all processors): 136
>> INFOG(23) (after analysis: value of ICNTL(6) effectively
>> used): 0
>> INFOG(24) (after analysis: value of ICNTL(12) effectively
>> used): 1
>> INFOG(25) (after factorization: number of pivots modified
>> by static pivoting): 0
>> INFOG(28) (after factorization: number of null pivots
>> encountered): 0
>> INFOG(29) (after factorization: effective number of entries
>> in the factors (sum over all processors)): 2931438
>> INFOG(30, 31) (after solution: size in Mbytes of memory
>> used during solution phase): 0, 0
>> INFOG(32) (after analysis: type of analysis done): 1
>> INFOG(33) (value used for ICNTL(8)): 7
>> INFOG(34) (exponent of the determinant if determinant is
>> requested): 0
>> linear system matrix = precond matrix:
>> Mat Object: 4 MPI processes
>> type: mpiaij
>> rows=22878, cols=22878
>> total: nonzeros=1219140, allocated nonzeros=1219140
>> total number of mallocs used during MatSetValues calls =0
>> using I-node (on process 0) routines: found 1889 nodes, limit used
>> is 5
>> converged reason: -11
>>
>> ------------------------------------------------------------
>> -----------------------------------------
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160919/7fd3abb5/attachment-0001.html>
More information about the petsc-users
mailing list