[petsc-users] Issue updating MUMPS ictnl after failed solve
Hong
hzhang at mcs.anl.gov
Mon Sep 19 16:47:31 CDT 2016
David :
I'll check it ...
Hong
When I use MUMPS via PETSc, one issue is that it can sometimes fail with
> MUMPS error -9, which means that MUMPS didn't allocate a big enough
> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
> via the command line option -mat_mumps_icntl_14.
>
> However, instead of having to run several times with different command
> line options, I'd like to be able to automatically increment icntl 14 value
> in a loop until the solve succeeds.
>
> I have a saved matrix which fails when I use it for a solve with MUMPS
> with 4 MPI processes and the default ictnl values, so I'm using this to
> check that I can achieve the automatic icntl 14 update, as described above.
> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send
> it to anyone else who wants to try this test case out.)
>
> I've pasted some test code below which provides a simple test of this idea
> using two solves. The first solve uses the default value of icntl 14, which
> fails, and then we update icntl 14 to 30 and solve again. The second solve
> should succeed since icntl 14 of 30 is sufficient for MUMPS to succeed in
> this case, but for some reason the second solve still fails.
>
> Below I've also pasted the output from -ksp_view, and you can see that
> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
> output), so it's not clear to me why the second solve fails. It seems like
> MUMPS is ignoring the update to the ictnl value?
>
> Thanks,
> David
>
> ------------------------------------------------------------
> -----------------------------------------
> Test code:
>
> Mat A;
> MatCreate(PETSC_COMM_WORLD,&A);
> MatSetType(A,MATMPIAIJ);
>
> PetscViewer petsc_viewer;
> PetscViewerBinaryOpen( PETSC_COMM_WORLD,
> "matrix.dat",
> FILE_MODE_READ,
> &petsc_viewer);
> MatLoad(A, petsc_viewer);
> PetscViewerDestroy(&petsc_viewer);
>
> PetscInt m, n;
> MatGetSize(A, &m, &n);
>
> Vec x;
> VecCreate(PETSC_COMM_WORLD,&x);
> VecSetSizes(x,PETSC_DECIDE,m);
> VecSetFromOptions(x);
> VecSet(x,1.0);
>
> Vec b;
> VecDuplicate(x,&b);
>
> KSP ksp;
> PC pc;
>
> KSPCreate(PETSC_COMM_WORLD,&ksp);
> KSPSetOperators(ksp,A,A);
>
> KSPSetType(ksp,KSPPREONLY);
> KSPGetPC(ksp,&pc);
>
> PCSetType(pc,PCCHOLESKY);
>
> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
> PCFactorSetUpMatSolverPackage(pc);
>
> KSPSetFromOptions(ksp);
> KSPSetUp(ksp);
>
> KSPSolve(ksp,b,x);
>
> {
> KSPConvergedReason reason;
> KSPGetConvergedReason(ksp, &reason);
> std::cout << "converged reason: " << reason << std::endl;
> }
>
> Mat F;
> PCFactorGetMatrix(pc,&F);
> MatMumpsSetIcntl(F,14,30);
>
> KSPSolve(ksp,b,x);
>
> {
> KSPConvergedReason reason;
> KSPGetConvergedReason(ksp, &reason);
> std::cout << "converged reason: " << reason << std::endl;
> }
>
> ------------------------------------------------------------
> -----------------------------------------
> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged
> reason: -11" for both solves)
>
> KSP Object: 4 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: 4 MPI processes
> type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 4 MPI processes
> type: mpiaij
> rows=22878, cols=22878
> package used to perform factorization: mumps
> total: nonzeros=3361617, allocated nonzeros=3361617
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 2
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequentia matrix ordering):7
> ICNTL(8) (scalling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 0
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 20
> ICNTL(18) (input mat struct): 3
> ICNTL(19) (Shur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 1
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter): -24
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> RINFO(1) (local estimated flops for the elimination after
> analysis):
> [0] 1.84947e+08
> [1] 2.42065e+08
> [2] 2.53044e+08
> [3] 2.18441e+08
> RINFO(2) (local estimated flops for the assembly after
> factorization):
> [0] 945938.
> [1] 906795.
> [2] 897815.
> [3] 998840.
> RINFO(3) (local estimated flops for the elimination after
> factorization):
> [0] 1.59835e+08
> [1] 1.50867e+08
> [2] 2.27932e+08
> [3] 1.52037e+08
> INFO(15) (estimated size of (in MB) MUMPS internal data for
> running numerical factorization):
> [0] 36
> [1] 37
> [2] 38
> [3] 39
> INFO(16) (size of (in MB) MUMPS internal data used during
> numerical factorization):
> [0] 36
> [1] 37
> [2] 38
> [3] 39
> INFO(23) (num of pivots eliminated on this processor after
> factorization):
> [0] 6450
> [1] 5442
> [2] 4386
> [3] 5526
> RINFOG(1) (global estimated flops for the elimination after
> analysis): 8.98497e+08
> RINFOG(2) (global estimated flops for the assembly after
> factorization): 3.74939e+06
> RINFOG(3) (global estimated flops for the elimination after
> factorization): 6.9067e+08
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 4082184
> INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 231846
> INFOG(5) (estimated maximum front size in the complete
> tree): 678
> INFOG(6) (number of nodes in the complete tree): 1380
> INFOG(7) (ordering option effectively use after analysis): 5
> INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 3521904
> INFOG(10) (total integer space store the matrix factors
> after factorization): 229416
> INFOG(11) (order of largest frontal matrix after
> factorization): 678
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization): 0
> INFOG(14) (number of memory compress after factorization): 0
> INFOG(15) (number of steps of iterative refinement after
> solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal data
> for factorization after analysis: value on the most memory consuming
> processor): 39
> INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 150
> INFOG(18) (size of all MUMPS internal data allocated during
> factorization: value on the most memory consuming processor): 39
> INFOG(19) (size of all MUMPS internal data allocated during
> factorization: sum over all processors): 150
> INFOG(20) (estimated number of entries in the factors):
> 3361617
> INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 35
> INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 136
> INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0
> INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1
> INFOG(25) (after factorization: number of pivots modified by
> static pivoting): 0
> INFOG(28) (after factorization: number of null pivots
> encountered): 0
> INFOG(29) (after factorization: effective number of entries
> in the factors (sum over all processors)): 2931438
> INFOG(30, 31) (after solution: size in Mbytes of memory used
> during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): 7
> INFOG(34) (exponent of the determinant if determinant is
> requested): 0
> linear system matrix = precond matrix:
> Mat Object: 4 MPI processes
> type: mpiaij
> rows=22878, cols=22878
> total: nonzeros=1219140, allocated nonzeros=1219140
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 1889 nodes, limit used
> is 5
> converged reason: -11
> KSP Object: 4 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: 4 MPI processes
> type: cholesky
> Cholesky: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 4 MPI processes
> type: mpiaij
> rows=22878, cols=22878
> package used to perform factorization: mumps
> total: nonzeros=3361617, allocated nonzeros=3361617
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 2
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequentia matrix ordering):7
> ICNTL(8) (scalling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 0
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 30
> ICNTL(18) (input mat struct): 3
> ICNTL(19) (Shur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 1
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter): -24
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> RINFO(1) (local estimated flops for the elimination after
> analysis):
> [0] 1.84947e+08
> [1] 2.42065e+08
> [2] 2.53044e+08
> [3] 2.18441e+08
> RINFO(2) (local estimated flops for the assembly after
> factorization):
> [0] 945938.
> [1] 906795.
> [2] 897815.
> [3] 998840.
> RINFO(3) (local estimated flops for the elimination after
> factorization):
> [0] 1.59835e+08
> [1] 1.50867e+08
> [2] 2.27932e+08
> [3] 1.52037e+08
> INFO(15) (estimated size of (in MB) MUMPS internal data for
> running numerical factorization):
> [0] 36
> [1] 37
> [2] 38
> [3] 39
> INFO(16) (size of (in MB) MUMPS internal data used during
> numerical factorization):
> [0] 36
> [1] 37
> [2] 38
> [3] 39
> INFO(23) (num of pivots eliminated on this processor after
> factorization):
> [0] 6450
> [1] 5442
> [2] 4386
> [3] 5526
> RINFOG(1) (global estimated flops for the elimination after
> analysis): 8.98497e+08
> RINFOG(2) (global estimated flops for the assembly after
> factorization): 3.74939e+06
> RINFOG(3) (global estimated flops for the elimination after
> factorization): 6.9067e+08
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 4082184
> INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 231846
> INFOG(5) (estimated maximum front size in the complete
> tree): 678
> INFOG(6) (number of nodes in the complete tree): 1380
> INFOG(7) (ordering option effectively use after analysis): 5
> INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 3521904
> INFOG(10) (total integer space store the matrix factors
> after factorization): 229416
> INFOG(11) (order of largest frontal matrix after
> factorization): 678
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization): 0
> INFOG(14) (number of memory compress after factorization): 0
> INFOG(15) (number of steps of iterative refinement after
> solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal data
> for factorization after analysis: value on the most memory consuming
> processor): 39
> INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 150
> INFOG(18) (size of all MUMPS internal data allocated during
> factorization: value on the most memory consuming processor): 39
> INFOG(19) (size of all MUMPS internal data allocated during
> factorization: sum over all processors): 150
> INFOG(20) (estimated number of entries in the factors):
> 3361617
> INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 35
> INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 136
> INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0
> INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1
> INFOG(25) (after factorization: number of pivots modified by
> static pivoting): 0
> INFOG(28) (after factorization: number of null pivots
> encountered): 0
> INFOG(29) (after factorization: effective number of entries
> in the factors (sum over all processors)): 2931438
> INFOG(30, 31) (after solution: size in Mbytes of memory used
> during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): 7
> INFOG(34) (exponent of the determinant if determinant is
> requested): 0
> linear system matrix = precond matrix:
> Mat Object: 4 MPI processes
> type: mpiaij
> rows=22878, cols=22878
> total: nonzeros=1219140, allocated nonzeros=1219140
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 1889 nodes, limit used
> is 5
> converged reason: -11
>
> ------------------------------------------------------------
> -----------------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160919/eb76320f/attachment-0001.html>
More information about the petsc-users
mailing list