[petsc-users] Setting up MUMPS in PETSc

Jed Brown jedbrown at mcs.anl.gov
Tue Oct 23 18:38:21 CDT 2012


On Tue, Oct 23, 2012 at 6:34 PM, Jinquan Zhong <jzhong at scsolutions.com>wrote:

>  *
> That is true.  I used*
>
> ** **
>
> -pc_type lu -pc_factor_mat_solver_package mumps -ksp_view
> -mat_mumps_icntl_4 2 -mat_mumps_icntl_5 0 -mat_mumps_icntl_18 3
> -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 -ksp_type gmres
> -ksp_monitor_singular_value -ksp_gmres_restart 1000
>

Read my reply about using GMRES to estimate condition number _while_ using
a preconditioner.


> ****
>
> ** **
>
> ** **
>
> *Here is the info I got from the setting I had.  I didn’t see the
> condition number appeared:*
>
> ** **
>
> ** **
>
> Entering ZMUMPS driver with JOB, N, NZ =   1         894              0***
> *
>
> ** **
>
> ZMUMPS 4.10.0        ****
>
> L U Solver for unsymmetric matrices****
>
> Type of parallelism: Working host****
>
> ** **
>
> ****** ANALYSIS STEP ************
>
> ** **
>
> ** Max-trans not allowed because matrix is distributed****
>
> Using ParMETIS for parallel ordering.****
>
> Structual symmetry is:100%****
>
>   WARNING: Largest root node of size          173  not selected for
> parallel execution****
>
> ** **
>
> Leaving analysis phase with  ...****
>
> INFOG(1)                                       =               0****
>
> INFOG(2)                                       =               0****
>
> -- (20) Number of entries in factors (estim.) =          306174****
>
> --  (3) Storage of factors  (REAL, estimated) =          306174****
>
> --  (4) Storage of factors  (INT , estimated) =            7102****
>
> --  (5) Maximum frontal size      (estimated) =             495****
>
> --  (6) Number of nodes in the tree           =              66****
>
> -- (32) Type of analysis effectively used     =               2****
>
> --  (7) Ordering option effectively used      =               2****
>
> ICNTL(6) Maximum transversal option            =               0****
>
> ICNTL(7) Pivot order option                    =               7****
>
> Percentage of memory relaxation (effective)    =              25****
>
> Number of level 2 nodes                        =               0****
>
> Number of split nodes                          =               0****
>
> RINFOG(1) Operations during elimination (estim)=   8.929D+07****
>
> Distributed matrix entry format (ICNTL(18))    =               3****
>
> ** Rank of proc needing largest memory in IC facto        :         1****
>
> ** Estimated corresponding MBYTES for IC facto            :        43****
>
> ** Estimated avg. MBYTES per work. proc at facto (IC)     :        39****
>
> ** TOTAL     space in MBYTES for IC factorization         :       156****
>
> ** Rank of proc needing largest memory for OOC facto      :         1****
>
> ** Estimated corresponding MBYTES for OOC facto           :        46****
>
> ** Estimated avg. MBYTES per work. proc at facto (OOC)    :        41****
>
> ** TOTAL     space in MBYTES for OOC factorization        :       165****
>
> Entering ZMUMPS driver with JOB, N, NZ =   2         894         263360***
> *
>
> ** **
>
> ****** FACTORIZATION STEP ************
>
> ** **
>
> ** **
>
> GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...****
>
> NUMBER OF WORKING PROCESSES              =           4****
>
> OUT-OF-CORE OPTION (ICNTL(22))           =           0****
>
> REAL SPACE FOR FACTORS                   =      306174****
>
> INTEGER SPACE FOR FACTORS                =        7102****
>
> MAXIMUM FRONTAL SIZE (ESTIMATED)         =         495****
>
> NUMBER OF NODES IN THE TREE              =          66****
>
> Convergence error after scaling for ONE-NORM (option 7/8)   = 0.31D+00****
>
> Maximum effective relaxed size of S              =      306300****
>
> Average effective relaxed size of S              =      185353****
>
> ** **
>
> REDISTRIB: TOTAL DATA LOCAL/SENT         =       68276      195084****
>
> GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.0145****
>
> ** Memory relaxation parameter ( ICNTL(14)  )            :        25****
>
> ** Rank of processor needing largest memory in facto     :         1****
>
> ** Space in MBYTES used by this processor for facto      :        43****
>
> ** Avg. Space in MBYTES per working proc during facto    :        39****
>
> ** **
>
> ELAPSED TIME FOR FACTORIZATION           =      0.0862****
>
> Maximum effective space used in S   (KEEP8(67)   =      245025****
>
> Average effective space used in S   (KEEP8(67)   =      119077****
>
> ** EFF Min: Rank of processor needing largest memory :         1****
>
> ** EFF Min: Space in MBYTES used by this processor   :        42****
>
> ** EFF Min: Avg. Space in MBYTES per working proc    :        37****
>
> ** **
>
> GLOBAL STATISTICS ****
>
>  RINFOG(2)  OPERATIONS IN NODE ASSEMBLY   = 2.372D+05****
>
> ------(3)  OPERATIONS IN NODE ELIMINATION= 8.929D+07****
>
> INFOG (9)  REAL SPACE FOR FACTORS        =      306184****
>
> INFOG(10)  INTEGER SPACE FOR FACTORS     =        7104****
>
> INFOG(11)  MAXIMUM FRONT SIZE            =         495****
>
> INFOG(29)  NUMBER OF ENTRIES IN FACTORS  =      306184****
>
> INFOG(12) NB OF OFF DIAGONAL PIVOTS      =          29****
>
> INFOG(13)  NUMBER OF DELAYED PIVOTS      =           1****
>
> INFOG(14)  NUMBER OF MEMORY COMPRESS     =           0****
>
> KEEP8(108) Extra copies IP stacking      =           0****
>
> Entering ZMUMPS driver with JOB, N, NZ =   3         894         263360***
> *
>
> ** **
>
> ** **
>
> ****** SOLVE & CHECK STEP ************
>
> ** **
>
> ** **
>
> STATISTICS PRIOR SOLVE PHASE     ...........****
>
> NUMBER OF RIGHT-HAND-SIDES                    =           1****
>
> BLOCKING FACTOR FOR MULTIPLE RHS              =           1****
>
> ICNTL (9)                                     =           1****
>
>   --- (10)                                     =           0****
>
>   --- (11)                                     =           0****
>
>   --- (20)                                     =           0****
>
>   --- (21)                                     =           1****
>
>  --- (30)                                     =           0****
>
> ** Rank of processor needing largest memory in solve     :         1****
>
> ** Space in MBYTES used by this processor for solve      :         6****
>
> ** Avg. Space in MBYTES per working proc during solve    :         3****
>
>   0 KSP Residual norm 1.890433086271e+01 % max 1.000000000000e+00 min
> 1.000000000000e+00 max/min 1.000000000000e+00****
>
> Entering ZMUMPS driver with JOB, N, NZ =   3         894         263360***
> *
>
> ** **
>
> ** **
>
> ****** SOLVE & CHECK STEP ************
>
> ** **
>
> ** **
>
> STATISTICS PRIOR SOLVE PHASE     ...........****
>
> NUMBER OF RIGHT-HAND-SIDES                    =           1****
>
> BLOCKING FACTOR FOR MULTIPLE RHS              =           1****
>
> ICNTL (9)                                     =           1****
>
>   --- (10)                                     =           0****
>
>   --- (11)                                     =           0****
>
>   --- (20)                                     =           0****
>
>   --- (21)                                     =           1****
>
>   --- (30)                                     =           0****
>
> ** Rank of processor needing largest memory in solve     :         1****
>
> ** Space in MBYTES used by this processor for solve      :         6****
>
> ** Avg. Space in MBYTES per working proc during solve    :         3****
>
>   1 KSP Residual norm 1.804170434909e-06 % max 1.000000180789e+00 min
> 1.000000180789e+00 max/min 1.000000000000e+00****
>
> Entering ZMUMPS driver with JOB, N, NZ =   3         894         263360***
> *
>
> ** **
>
> ** **
>
> ****** SOLVE & CHECK STEP ************
>
> ** **
>
> ** **
>
> STATISTICS PRIOR SOLVE PHASE     ...........****
>
> NUMBER OF RIGHT-HAND-SIDES                    =           1****
>
> BLOCKING FACTOR FOR MULTIPLE RHS              =           1****
>
> ICNTL (9)                                     =           1****
>
>   --- (10)                                     =           0****
>
>   --- (11)                                     =           0****
>
>   --- (20)                                     =           0****
>
>   --- (21)                                     =           1****
>
>   --- (30)                                     =           0****
>
> ** Rank of processor needing largest memory in solve     :         1****
>
> ** Space in MBYTES used by this processor for solve      :         6****
>
> ** Avg. Space in MBYTES per working proc during solve    :         3****
>
>   2 KSP Residual norm 4.466758995607e-13 % max 1.000000467998e+00 min
> 9.999992798573e-01 max/min 1.000001188141e+00
>

Here are your estimates of max and min singular values, and their ratio
(the condition number). These are for the *preconditioned operator*. Since
you use a direct solver, it is almost exactly 1.


> ****
>
> KSP Object: 4 MPI processes****
>
>   type: gmres****
>
>     GMRES: restart=1000, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement****
>
>     GMRES: happy breakdown tolerance 1e-30****
>
>   maximum iterations=10000, initial guess is zero****
>
>   tolerances:  relative=1e-12, absolute=1e-12, divergence=10000****
>
>   left preconditioning****
>
>   using PRECONDITIONED norm type for convergence test****
>
> PC Object: 4 MPI processes****
>
>   type: lu****
>
>     LU: out-of-place factorization****
>
>     tolerance for zero pivot 2.22045e-14****
>
>     matrix ordering: natural****
>
>     factor fill ratio given 0, needed 0****
>
>       Factored matrix follows:****
>
>         Matrix Object:         4 MPI processes****
>
>           type: mpiaij****
>
>           rows=894, cols=894****
>
>           package used to perform factorization: mumps****
>
>           total: nonzeros=306174, allocated nonzeros=306174****
>
>           total number of mallocs used during MatSetValues calls =0****
>
>             MUMPS run parameters:****
>
>               SYM (matrix type):                   0 ****
>
>               PAR (host participation):            1 ****
>
>               ICNTL(1) (output for error):         6 ****
>
>               ICNTL(2) (output of diagnostic msg): 0 ****
>
>               ICNTL(3) (output for global info):   6 ****
>
>               ICNTL(4) (level of printing):        2 ****
>
>               ICNTL(5) (input mat struct):         0 ****
>
>               ICNTL(6) (matrix prescaling):        7 ****
>
>               ICNTL(7) (sequentia matrix ordering):7 ****
>
>               ICNTL(8) (scalling strategy):        77 ****
>
>               ICNTL(10) (max num of refinements):  0 ****
>
>               ICNTL(11) (error analysis):          0 ****
>
>               ICNTL(12) (efficiency control):                         1 **
> **
>
>               ICNTL(13) (efficiency control):                         0 **
> **
>
>               ICNTL(14) (percentage of estimated workspace increase): 20 *
> ***
>
>               ICNTL(18) (input mat struct):                           3 **
> **
>
>               ICNTL(19) (Shur complement info):                       0 **
> **
>
>               ICNTL(20) (rhs sparse pattern):                         0 **
> **
>
>               ICNTL(21) (solution struct):                            1 **
> **
>
>               ICNTL(22) (in-core/out-of-core facility):               0 **
> **
>
>               ICNTL(23) (max size of memory can be allocated locally):0 **
> **
>
>               ICNTL(24) (detection of null pivot rows):               0 **
> **
>
>               ICNTL(25) (computation of a null space basis):          0 **
> **
>
>               ICNTL(26) (Schur options for rhs or solution):          0 **
> **
>
>               ICNTL(27) (experimental parameter):                     -8 *
> ***
>
>               ICNTL(28) (use parallel or sequential ordering):        2 **
> **
>
>               ICNTL(29) (parallel ordering):                          2 **
> **
>
>               ICNTL(30) (user-specified set of entries in inv(A)):    0 **
> **
>
>               ICNTL(31) (factors is discarded in the solve phase):    0 **
> **
>
>               ICNTL(33) (compute determinant):                        0 **
> **
>
>               CNTL(1) (relative pivoting threshold):      0.01 ****
>
>               CNTL(2) (stopping criterion of refinement): 1.49012e-08 ****
>
>               CNTL(3) (absolute pivoting threshold):      0 ****
>
>               CNTL(4) (value of static pivoting):         -1 ****
>
>               CNTL(5) (fixation for null pivots):         0 ****
>
>               RINFO(1) (local estimated flops for the elimination after
> analysis): ****
>
>                 [0] 4.25638e+06 ****
>
>                 [1] 7.22129e+07 ****
>
>                 [2] 8.01308e+06 ****
>
>                 [3] 4.81122e+06 ****
>
>               RINFO(2) (local estimated flops for the assembly after
> factorization): ****
>
>                 [0]  68060 ****
>
>                 [1]  5860 ****
>
>                 [2]  83570 ****
>
>                 [3]  79676 ****
>
>               RINFO(3) (local estimated flops for the elimination after
> factorization): ****
>
>                 [0]  4.25638e+06 ****
>
>                 [1]  7.2213e+07 ****
>
>                 [2]  8.01308e+06 ****
>
>                 [3]  4.81122e+06 ****
>
>               INFO(15) (estimated size of (in MB) MUMPS internal data for
> running numerical factorization): ****
>
>               [0] 37 ****
>
>               [1] 43 ****
>
>               [2] 38 ****
>
>               [3] 38 ****
>
>               INFO(16) (size of (in MB) MUMPS internal data used during
> numerical factorization): ****
>
>                 [0] 37 ****
>
>                 [1] 43 ****
>
>                 [2] 38 ****
>
>                 [3] 38 ****
>
>               INFO(23) (num of pivots eliminated on this processor after
> factorization): ****
>
>                 [0] 350 ****
>
>                 [1] 319 ****
>
>                 [2] 134 ****
>
>                 [3] 91 ****
>
>               RINFOG(1) (global estimated flops for the elimination after
> analysis): 8.92936e+07 ****
>
>               RINFOG(2) (global estimated flops for the assembly after
> factorization): 237166 ****
>
>               RINFOG(3) (global estimated flops for the elimination after
> factorization): 8.92937e+07 ****
>
>               (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0,0)*(2^0)****
>
>               INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 306174 ****
>
>               INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 7102 ****
>
>               INFOG(5) (estimated maximum front size in the complete
> tree): 495 ****
>
>               INFOG(6) (number of nodes in the complete tree): 66 ****
>
>               INFOG(7) (ordering option effectively use after analysis): 2
> ****
>
>               INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100 ****
>
>               INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 306184 ****
>
>               INFOG(10) (total integer space store the matrix factors
> after factorization): 7104 ****
>
>               INFOG(11) (order of largest frontal matrix after
> factorization): 495 ****
>
>               INFOG(12) (number of off-diagonal pivots): 29 ****
>
>               INFOG(13) (number of delayed pivots after factorization): 1
> ****
>
>               INFOG(14) (number of memory compress after factorization): 0
> ****
>
>               INFOG(15) (number of steps of iterative refinement after
> solution): 0 ****
>
>               INFOG(16) (estimated size (in MB) of all MUMPS internal data
> for factorization after analysis: value on the most memory consuming
> processor): 43 ****
>
>               INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 156 ****
>
>               INFOG(18) (size of all MUMPS internal data allocated during
> factorization: value on the most memory consuming processor): 43 ****
>
>               INFOG(19) (size of all MUMPS internal data allocated during
> factorization: sum over all processors): 156 ****
>
>               INFOG(20) (estimated number of entries in the factors):
> 306174 ****
>
>               INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 42 ****
>
>               INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 150 ****
>
>               INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0 ****
>
>               INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1 ****
>
>               INFOG(25) (after factorization: number of pivots modified by
> static pivoting): 0 ****
>
>   linear system matrix = precond matrix:****
>
>   Matrix Object:   4 MPI processes****
>
>     type: mpiaij****
>
>     rows=894, cols=894****
>
>     total: nonzeros=263360, allocated nonzeros=263360****
>
>     total number of mallocs used during MatSetValues calls =0****
>
>       using I-node (on process 0) routines: found 47 nodes, limit used is 5
> ****
>
> >>  # of iterations: 2****
>
> KSPConvergedReason: 3****
>
> Entering ZMUMPS driver with JOB, N, NZ =  -2         894         263360***
> *
>
> ** **
>
> ** **
>
> Jinquan****
>
> ** **
>
> *From:* petsc-users-bounces at mcs.anl.gov [mailto:
> petsc-users-bounces at mcs.anl.gov] *On Behalf Of *Matthew Knepley
> *Sent:* Tuesday, October 23, 2012 4:30 PM
>
> *To:* PETSc users list
> *Subject:* Re: [petsc-users] Setting up MUMPS in PETSc****
>
> ** **
>
> On Tue, Oct 23, 2012 at 7:17 PM, Jinquan Zhong <jzhong at scsolutions.com>
> wrote:****
>
>  Thanks, Jed.****
>
>  ****
>
> Any way to get the condition number.  I used ****
>
>  ****
>
> -ksp_type gmres -ksp_monitor_singular_value -ksp_gmres_restart 1000****
>
>  ****
>
> It didn’t work.****
>
>  ** **
>
> This is not helpful. Would you understand this if someone mailed you "It
> didn't work"? Send the output.****
>
> ** **
>
>    Matt****
>
>  ****
>
>   ****
>
> Jinquan****
>
>  ****
>
> *From:* petsc-users-bounces at mcs.anl.gov [mailto:
> petsc-users-bounces at mcs.anl.gov] *On Behalf Of *Jed Brown
> *Sent:* Tuesday, October 23, 2012 3:28 PM
> *To:* PETSc users list
> *Subject:* Re: [petsc-users] Setting up MUMPS in PETSc****
>
>  ****
>
> On Tue, Oct 23, 2012 at 5:21 PM, Jinquan Zhong <jzhong at scsolutions.com>
> wrote:****
>
> That is new for me.  What would you suggest, Matt?****
>
>   ****
>
> Were you using LU or Cholesky before? That is the difference between
> -pc_type lu and -pc_type cholesky. Use -pc_factor_mat_solver_package mumps
> to choose MUMPS. You can access the MUMPS options with
> -mat_mumps_icntl_opaquenumber.****
>
>  ****
>
> It looks like PETSc's Clique interface does not work in parallel. (A
> student was working on it recently and it seems to work in serial.) When it
> is fixed to work in parallel, that is almost certainly what you should use.
> Alternatively, there may well be a Fast method, depending on the structure
> of the system and that fat dense block.****
>
>
>
> ****
>
> ** **
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener****
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121023/15738f87/attachment-0001.html>


More information about the petsc-users mailing list