[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Thu Mar 18 20:46:40 CDT 2021

Hi again,

ok, just saw that some matrices have lines of "0" in case of 3D hermite 
DOFs (ex: du/dz derivatives) when used into a 2D plane mesh...

So, my last problem about hypre smoother is "normal".

However, just to play with one of this matrix, I tried to do a "LU" with 
mumps icntl_24 option activated on the global system: fine it works.

Then I tried to switche to GAMG with mumps for the coarse_sub level, but 
it seems my icntl_24 option is then ignored and I don't know why...

See my KSP:

KSP Object: (Options_ProjectionL2_0) 1 MPI processes
   type: bcgs
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-15, absolute=1e-15, divergence=1e+12
   left preconditioning
   using PRECONDITIONED norm type for convergence test
PC Object: (Options_ProjectionL2_0) 1 MPI processes
   type: gamg
     type is MULTIPLICATIVE, levels=2 cycles=v
       Cycles per PCApply=1
       Using externally compute Galerkin coarse grid matrices
       GAMG specific options
         Threshold for dropping small values in graph on each level =
         Threshold scaling factor for each level not specified = 1.
         AGG specific options
           Symmetric graph false
           Number of levels to square graph 1
           Number smoothing steps 1
         Complexity:    grid = 1.09756
   Coarse grid solver -- level -------------------------------
     KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
       type: preonly
       maximum iterations=10000, initial guess is zero
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
     PC Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
       type: bjacobi
         number of blocks = 1
         Local solver is the same for all blocks, as in the following 
KSP and PC objects on rank 0:
       KSP Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
         type: preonly
         maximum iterations=1, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
         type: lu
           out-of-place factorization
           tolerance for zero pivot 2.22045e-14
           using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
           matrix ordering: nd
           factor fill ratio given 0., needed 0.
             Factored matrix follows:
               Mat Object: 1 MPI processes
                 type: mumps
                 rows=8, cols=8
                 package used to perform factorization: mumps
                 total: nonzeros=64, allocated nonzeros=64
                   MUMPS run parameters:
                     SYM (matrix type):                   0
                     PAR (host participation):            1
                     ICNTL(1) (output for error):         6
                     ICNTL(2) (output of diagnostic msg): 0
                     ICNTL(3) (output for global info):   0
                     ICNTL(4) (level of printing):        0
                     ICNTL(5) (input mat struct):         0
                     ICNTL(6) (matrix prescaling):        7
                     ICNTL(7) (sequential matrix ordering):7
                     ICNTL(8) (scaling strategy):        77
                     ICNTL(10) (max num of refinements):  0
                     ICNTL(11) (error analysis):          0
                     ICNTL(12) (efficiency 
control):                         1
                     ICNTL(13) (sequential factorization of the root 
node):  0
                     ICNTL(14) (percentage of estimated workspace 
increase): 20
                     ICNTL(18) (input mat 
struct):                           0
                     ICNTL(19) (Schur complement 
info):                      0
                     ICNTL(20) (RHS sparse 
pattern):                         0
                     ICNTL(21) (solution 
struct):                            0
                     ICNTL(22) (in-core/out-of-core 
facility):               0
                     ICNTL(23) (max size of memory can be allocated 
locally):0
                     ICNTL(24) (detection of null pivot 
rows):               0
                     ICNTL(25) (computation of a null space 
basis):          0
                     ICNTL(26) (Schur options for RHS or 
solution):          0
                     ICNTL(27) (blocking size for multiple 
RHS):             -32
                     ICNTL(28) (use parallel or sequential 
ordering):        1
                     ICNTL(29) (parallel 
ordering):                          0
                     ICNTL(30) (user-specified set of entries in 
inv(A)):    0
                     ICNTL(31) (factors is discarded in the solve 
phase):    0
                     ICNTL(33) (compute 
determinant):                        0
                     ICNTL(35) (activate BLR based 
factorization):           0
                     ICNTL(36) (choice of BLR factorization 
variant):        0
                     ICNTL(38) (estimated compression rate of LU 
factors):   333
                     CNTL(1) (relative pivoting threshold): 0.01
                     CNTL(2) (stopping criterion of refinement): 
1.49012e-08
                     CNTL(3) (absolute pivoting threshold):      0.
                     CNTL(4) (value of static pivoting): -1.
                     CNTL(5) (fixation for null pivots):         0.
                     CNTL(7) (dropping parameter for BLR):       0.
                     RINFO(1) (local estimated flops for the elimination 
after analysis):
                       [0] 308.
                     RINFO(2) (local estimated flops for the assembly 
after factorization):
                       [0]  0.
                     RINFO(3) (local estimated flops for the elimination 
after factorization):
                       [0]  0.
                     INFO(15) (estimated size of (in MB) MUMPS internal 
data for running numerical factorization):
                     [0] 0
                     INFO(16) (size of (in MB) MUMPS internal data used 
during numerical factorization):
                       [0] 0
                     INFO(23) (num of pivots eliminated on this 
processor after factorization):
                       [0] 6
                     RINFOG(1) (global estimated flops for the 
elimination after analysis): 308.
                     RINFOG(2) (global estimated flops for the assembly 
after factorization): 0.
                     RINFOG(3) (global estimated flops for the 
elimination after factorization): 0.
                     (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): 
(0.,0.)*(2^0)
                     INFOG(3) (estimated real workspace for factors on 
all processors after analysis): 64
                     INFOG(4) (estimated integer workspace for factors 
on all processors after analysis): 35
                     INFOG(5) (estimated maximum front size in the 
complete tree): 8
                     INFOG(6) (number of nodes in the complete tree): 1
                     INFOG(7) (ordering option effectively use after 
analysis): 2
                     INFOG(8) (structural symmetry in percent of the 
permuted matrix after analysis): 100
                     INFOG(9) (total real/complex workspace to store the 
matrix factors after factorization): 64
                     INFOG(10) (total integer space store the matrix 
factors after factorization): 35
                     INFOG(11) (order of largest frontal matrix after 
factorization): 8
                     INFOG(12) (number of off-diagonal pivots): 0
                     INFOG(13) (number of delayed pivots after 
factorization): 0
                     INFOG(14) (number of memory compress after 
factorization): 0
                     INFOG(15) (number of steps of iterative refinement 
after solution): 0
                     INFOG(16) (estimated size (in MB) of all MUMPS 
internal data for factorization after analysis: value on the most memory 
consuming processor): 0
                     INFOG(17) (estimated size of all MUMPS internal 
data for factorization after analysis: sum over all processors): 0
                     INFOG(18) (size of all MUMPS internal data 
allocated during factorization: value on the most memory consuming 
processor): 0
                     INFOG(19) (size of all MUMPS internal data 
allocated during factorization: sum over all processors): 0
                     INFOG(20) (estimated number of entries in the 
factors): 64
                     INFOG(21) (size in MB of memory effectively used 
during factorization - value on the most memory consuming processor): 0
                     INFOG(22) (size in MB of memory effectively used 
during factorization - sum over all processors): 0
                     INFOG(23) (after analysis: value of ICNTL(6) 
effectively used): 0
                     INFOG(24) (after analysis: value of ICNTL(12) 
effectively used): 1
                     INFOG(25) (after factorization: number of pivots 
modified by static pivoting): 0
                     INFOG(28) (after factorization: number of null 
pivots encountered): 0
                     INFOG(29) (after factorization: effective number of 
entries in the factors (sum over all processors)): 0
                     INFOG(30, 31) (after solution: size in Mbytes of 
memory used during solution phase): 0, 0
                     INFOG(32) (after analysis: type of analysis done): 1
                     INFOG(33) (value used for ICNTL(8)): 7
                     INFOG(34) (exponent of the determinant if 
determinant is requested): 0
                     INFOG(35) (after factorization: number of entries 
taking into account BLR factor compression - sum over all processors): 0
                     INFOG(36) (after analysis: estimated size of all 
MUMPS internal data for running BLR in-core - value on the most memory 
consuming processor): 0
                     INFOG(37) (after analysis: estimated size of all 
MUMPS internal data for running BLR in-core - sum over all processors): 0
                     INFOG(38) (after analysis: estimated size of all 
MUMPS internal data for running BLR out-of-core - value on the most 
memory consuming processor): 0
                     INFOG(39) (after analysis: estimated size of all 
MUMPS internal data for running BLR out-of-core - sum over all 
processors): 0
         linear system matrix = precond matrix:
         Mat Object: 1 MPI processes
           type: seqaij
           rows=8, cols=8, bs=4
           total: nonzeros=64, allocated nonzeros=64
           total number of mallocs used during MatSetValues calls=0
             using I-node routines: found 2 nodes, limit used is 5
       linear system matrix = precond matrix:
       Mat Object: 1 MPI processes
         type: seqaij
         rows=8, cols=8, bs=4
         total: nonzeros=64, allocated nonzeros=64
         total number of mallocs used during MatSetValues calls=0
           using I-node routines: found 2 nodes, limit used is 5
   Down solver (pre-smoother) on level 1 -------------------------------
     KSP Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
       type: chebyshev
         eigenvalue estimates used:  min = 0., max = 0.
         eigenvalues estimate via gmres min 0., max 0.
         eigenvalues estimated using gmres with translations  [0. 0.1; 
0. 1.1]
         KSP Object: (Options_ProjectionL2_0mg_levels_1_esteig_) 1 MPI 
processes
           type: gmres
             restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
             happy breakdown tolerance 1e-30
           maximum iterations=10, initial guess is zero
           tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
           left preconditioning
           using PRECONDITIONED norm type for convergence test
         PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
           type: sor
             type = local_symmetric, iterations = 1, local iterations = 
1, omega = 1.
           linear system matrix = precond matrix:
           Mat Object: (Options_ProjectionL2_0) 1 MPI processes
             type: seqaij
             rows=36, cols=36, bs=4
             total: nonzeros=656, allocated nonzeros=656
             total number of mallocs used during MatSetValues calls=0
               using I-node routines: found 9 nodes, limit used is 5
         estimating eigenvalues using noisy right hand side
       maximum iterations=2, nonzero initial guess
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
     PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
       type: sor
         type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
       linear system matrix = precond matrix:
       Mat Object: (Options_ProjectionL2_0) 1 MPI processes
         type: seqaij
         rows=36, cols=36, bs=4
         total: nonzeros=656, allocated nonzeros=656
         total number of mallocs used during MatSetValues calls=0
           using I-node routines: found 9 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
   linear system matrix = precond matrix:
   Mat Object: (Options_ProjectionL2_0) 1 MPI processes
     type: seqaij
     rows=36, cols=36, bs=4
     total: nonzeros=656, allocated nonzeros=656
     total number of mallocs used during MatSetValues calls=0
       using I-node routines: found 9 nodes, limit used is 5

but I have this option left:

Option left: 
name:-Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 value: 1

and as you can see above I end with:

                     ICNTL(24) (detection of null pivot 
rows):               0

which is fatal in my case...

Can you see where I did wrong?

Thanks,

Eric