[petsc-users] Setting up MUMPS in PETSc
Jinquan Zhong
jzhong at scsolutions.com
Tue Oct 23 18:34:40 CDT 2012
That is true. I used
-pc_type lu -pc_factor_mat_solver_package mumps -ksp_view -mat_mumps_icntl_4 2 -mat_mumps_icntl_5 0 -mat_mumps_icntl_18 3 -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 -ksp_type gmres -ksp_monitor_singular_value -ksp_gmres_restart 1000
Here is the info I got from the setting I had. I didn't see the condition number appeared:
Entering ZMUMPS driver with JOB, N, NZ = 1 894 0
ZMUMPS 4.10.0
L U Solver for unsymmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
** Max-trans not allowed because matrix is distributed
Using ParMETIS for parallel ordering.
Structual symmetry is:100%
WARNING: Largest root node of size 173 not selected for parallel execution
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 306174
-- (3) Storage of factors (REAL, estimated) = 306174
-- (4) Storage of factors (INT , estimated) = 7102
-- (5) Maximum frontal size (estimated) = 495
-- (6) Number of nodes in the tree = 66
-- (32) Type of analysis effectively used = 2
-- (7) Ordering option effectively used = 2
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 7
Percentage of memory relaxation (effective) = 25
Number of level 2 nodes = 0
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 8.929D+07
Distributed matrix entry format (ICNTL(18)) = 3
** Rank of proc needing largest memory in IC facto : 1
** Estimated corresponding MBYTES for IC facto : 43
** Estimated avg. MBYTES per work. proc at facto (IC) : 39
** TOTAL space in MBYTES for IC factorization : 156
** Rank of proc needing largest memory for OOC facto : 1
** Estimated corresponding MBYTES for OOC facto : 46
** Estimated avg. MBYTES per work. proc at facto (OOC) : 41
** TOTAL space in MBYTES for OOC factorization : 165
Entering ZMUMPS driver with JOB, N, NZ = 2 894 263360
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 4
OUT-OF-CORE OPTION (ICNTL(22)) = 0
REAL SPACE FOR FACTORS = 306174
INTEGER SPACE FOR FACTORS = 7102
MAXIMUM FRONTAL SIZE (ESTIMATED) = 495
NUMBER OF NODES IN THE TREE = 66
Convergence error after scaling for ONE-NORM (option 7/8) = 0.31D+00
Maximum effective relaxed size of S = 306300
Average effective relaxed size of S = 185353
REDISTRIB: TOTAL DATA LOCAL/SENT = 68276 195084
GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.0145
** Memory relaxation parameter ( ICNTL(14) ) : 25
** Rank of processor needing largest memory in facto : 1
** Space in MBYTES used by this processor for facto : 43
** Avg. Space in MBYTES per working proc during facto : 39
ELAPSED TIME FOR FACTORIZATION = 0.0862
Maximum effective space used in S (KEEP8(67) = 245025
Average effective space used in S (KEEP8(67) = 119077
** EFF Min: Rank of processor needing largest memory : 1
** EFF Min: Space in MBYTES used by this processor : 42
** EFF Min: Avg. Space in MBYTES per working proc : 37
GLOBAL STATISTICS
RINFOG(2) OPERATIONS IN NODE ASSEMBLY = 2.372D+05
------(3) OPERATIONS IN NODE ELIMINATION= 8.929D+07
INFOG (9) REAL SPACE FOR FACTORS = 306184
INFOG(10) INTEGER SPACE FOR FACTORS = 7104
INFOG(11) MAXIMUM FRONT SIZE = 495
INFOG(29) NUMBER OF ENTRIES IN FACTORS = 306184
INFOG(12) NB OF OFF DIAGONAL PIVOTS = 29
INFOG(13) NUMBER OF DELAYED PIVOTS = 1
INFOG(14) NUMBER OF MEMORY COMPRESS = 0
KEEP8(108) Extra copies IP stacking = 0
Entering ZMUMPS driver with JOB, N, NZ = 3 894 263360
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 1
** Space in MBYTES used by this processor for solve : 6
** Avg. Space in MBYTES per working proc during solve : 3
0 KSP Residual norm 1.890433086271e+01 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00
Entering ZMUMPS driver with JOB, N, NZ = 3 894 263360
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 1
** Space in MBYTES used by this processor for solve : 6
** Avg. Space in MBYTES per working proc during solve : 3
1 KSP Residual norm 1.804170434909e-06 % max 1.000000180789e+00 min 1.000000180789e+00 max/min 1.000000000000e+00
Entering ZMUMPS driver with JOB, N, NZ = 3 894 263360
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 1
** Space in MBYTES used by this processor for solve : 6
** Avg. Space in MBYTES per working proc during solve : 3
2 KSP Residual norm 4.466758995607e-13 % max 1.000000467998e+00 min 9.999992798573e-01 max/min 1.000001188141e+00
KSP Object: 4 MPI processes
type: gmres
GMRES: restart=1000, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-12, absolute=1e-12, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
type: lu
LU: out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 0, needed 0
Factored matrix follows:
Matrix Object: 4 MPI processes
type: mpiaij
rows=894, cols=894
package used to perform factorization: mumps
total: nonzeros=306174, allocated nonzeros=306174
total number of mallocs used during MatSetValues calls =0
MUMPS run parameters:
SYM (matrix type): 0
PAR (host participation): 1
ICNTL(1) (output for error): 6
ICNTL(2) (output of diagnostic msg): 0
ICNTL(3) (output for global info): 6
ICNTL(4) (level of printing): 2
ICNTL(5) (input mat struct): 0
ICNTL(6) (matrix prescaling): 7
ICNTL(7) (sequentia matrix ordering):7
ICNTL(8) (scalling strategy): 77
ICNTL(10) (max num of refinements): 0
ICNTL(11) (error analysis): 0
ICNTL(12) (efficiency control): 1
ICNTL(13) (efficiency control): 0
ICNTL(14) (percentage of estimated workspace increase): 20
ICNTL(18) (input mat struct): 3
ICNTL(19) (Shur complement info): 0
ICNTL(20) (rhs sparse pattern): 0
ICNTL(21) (solution struct): 1
ICNTL(22) (in-core/out-of-core facility): 0
ICNTL(23) (max size of memory can be allocated locally):0
ICNTL(24) (detection of null pivot rows): 0
ICNTL(25) (computation of a null space basis): 0
ICNTL(26) (Schur options for rhs or solution): 0
ICNTL(27) (experimental parameter): -8
ICNTL(28) (use parallel or sequential ordering): 2
ICNTL(29) (parallel ordering): 2
ICNTL(30) (user-specified set of entries in inv(A)): 0
ICNTL(31) (factors is discarded in the solve phase): 0
ICNTL(33) (compute determinant): 0
CNTL(1) (relative pivoting threshold): 0.01
CNTL(2) (stopping criterion of refinement): 1.49012e-08
CNTL(3) (absolute pivoting threshold): 0
CNTL(4) (value of static pivoting): -1
CNTL(5) (fixation for null pivots): 0
RINFO(1) (local estimated flops for the elimination after analysis):
[0] 4.25638e+06
[1] 7.22129e+07
[2] 8.01308e+06
[3] 4.81122e+06
RINFO(2) (local estimated flops for the assembly after factorization):
[0] 68060
[1] 5860
[2] 83570
[3] 79676
RINFO(3) (local estimated flops for the elimination after factorization):
[0] 4.25638e+06
[1] 7.2213e+07
[2] 8.01308e+06
[3] 4.81122e+06
INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization):
[0] 37
[1] 43
[2] 38
[3] 38
INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization):
[0] 37
[1] 43
[2] 38
[3] 38
INFO(23) (num of pivots eliminated on this processor after factorization):
[0] 350
[1] 319
[2] 134
[3] 91
RINFOG(1) (global estimated flops for the elimination after analysis): 8.92936e+07
RINFOG(2) (global estimated flops for the assembly after factorization): 237166
RINFOG(3) (global estimated flops for the elimination after factorization): 8.92937e+07
(RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0,0)*(2^0)
INFOG(3) (estimated real workspace for factors on all processors after analysis): 306174
INFOG(4) (estimated integer workspace for factors on all processors after analysis): 7102
INFOG(5) (estimated maximum front size in the complete tree): 495
INFOG(6) (number of nodes in the complete tree): 66
INFOG(7) (ordering option effectively use after analysis): 2
INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 306184
INFOG(10) (total integer space store the matrix factors after factorization): 7104
INFOG(11) (order of largest frontal matrix after factorization): 495
INFOG(12) (number of off-diagonal pivots): 29
INFOG(13) (number of delayed pivots after factorization): 1
INFOG(14) (number of memory compress after factorization): 0
INFOG(15) (number of steps of iterative refinement after solution): 0
INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 43
INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 156
INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 43
INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 156
INFOG(20) (estimated number of entries in the factors): 306174
INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 42
INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 150
INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0
INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
linear system matrix = precond matrix:
Matrix Object: 4 MPI processes
type: mpiaij
rows=894, cols=894
total: nonzeros=263360, allocated nonzeros=263360
total number of mallocs used during MatSetValues calls =0
using I-node (on process 0) routines: found 47 nodes, limit used is 5
>> # of iterations: 2
KSPConvergedReason: 3
Entering ZMUMPS driver with JOB, N, NZ = -2 894 263360
Jinquan
From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Matthew Knepley
Sent: Tuesday, October 23, 2012 4:30 PM
To: PETSc users list
Subject: Re: [petsc-users] Setting up MUMPS in PETSc
On Tue, Oct 23, 2012 at 7:17 PM, Jinquan Zhong <jzhong at scsolutions.com<mailto:jzhong at scsolutions.com>> wrote:
Thanks, Jed.
Any way to get the condition number. I used
-ksp_type gmres -ksp_monitor_singular_value -ksp_gmres_restart 1000
It didn't work.
This is not helpful. Would you understand this if someone mailed you "It didn't work"? Send the output.
Matt
Jinquan
From: petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov> [mailto:petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>] On Behalf Of Jed Brown
Sent: Tuesday, October 23, 2012 3:28 PM
To: PETSc users list
Subject: Re: [petsc-users] Setting up MUMPS in PETSc
On Tue, Oct 23, 2012 at 5:21 PM, Jinquan Zhong <jzhong at scsolutions.com<mailto:jzhong at scsolutions.com>> wrote:
That is new for me. What would you suggest, Matt?
Were you using LU or Cholesky before? That is the difference between -pc_type lu and -pc_type cholesky. Use -pc_factor_mat_solver_package mumps to choose MUMPS. You can access the MUMPS options with -mat_mumps_icntl_opaquenumber.
It looks like PETSc's Clique interface does not work in parallel. (A student was working on it recently and it seems to work in serial.) When it is fixed to work in parallel, that is almost certainly what you should use. Alternatively, there may well be a Fast method, depending on the structure of the system and that fat dense block.
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121023/23e6e8f4/attachment-0001.html>
More information about the petsc-users
mailing list