[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Pierre Jolivet pierre at joliv.et
Fri Mar 19 06:47:54 CDT 2021


There is a fix there: https://gitlab.com/petsc/petsc/-/commit/ea41da7ae9ca74379a24f049c09adf372dd0fe67 <https://gitlab.com/petsc/petsc/-/commit/ea41da7ae9ca74379a24f049c09adf372dd0fe67>
It’s a spin-off of https://gitlab.com/petsc/petsc/-/commit/83f9b43b15ff2b6a8a4ff6970cc325feb78dc91b?merge_request_iid=3608 <https://gitlab.com/petsc/petsc/-/commit/83f9b43b15ff2b6a8a4ff6970cc325feb78dc91b?merge_request_iid=3608>
I don’t think it’s OK to have different behaviors between PCASM/PCGASM/PCBJacobi_Multiblock on one side and PCBJacobi_Singleblock on the other.
I’m also not sure what was the rationale for setting the same prefix for the SubMats and the Mat, as it makes it very hard to scope options properly, I think.
I can make a MR if you are OK with this change.

Thanks,
Pierre

$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi
[…]
    Mat Object: (sub_) 1 MPI processes
      type: seqaij
      rows=64, cols=64
[…]
$ mpirun -n 4 src/snes/tutorials/ex56 -cells 2,2,1 -max_conv_its 2 -petscspace_degree 2 -snes_max_it 2 -ksp_max_it 100 -ksp_type cg -ksp_rtol 1.e-10 -ksp_norm_type unpreconditioned -snes_rtol 1.e-10 -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -pc_gamg_coarse_eq_limit 10 -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale .0 -ksp_converged_reason -snes_monitor_short -ksp_monitor_short -snes_converged_reason -use_mat_nearnullspace true -mg_levels_ksp_max_it 2 -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.1 -mg_levels_pc_type jacobi -petscpartitioner_type simple -matptap_via scalable -ex56_dm_view -snes_view -mg_coarse_sub_pc_factor_mat_solver_type mumps -mg_coarse_sub_mat_mumps_icntl_11 1
[…]
                      ICNTL(11) (error analysis):          1
[…]
          linear system matrix = precond matrix:
          Mat Object: (mg_coarse_sub_) 1 MPI processes
            type: seqaij
            rows=6, cols=6, bs=6
[…]

> On 19 Mar 2021, at 12:01 PM, Pierre Jolivet <pierre at joliv.et> wrote:
> 
> 
> 
>> On 19 Mar 2021, at 5:00 AM, Barry Smith <bsmith at petsc.dev> wrote:
>> 
>> 
>> Eric,
>> 
>>> -Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 value: 1
>> 
>> If an option is skipped it is often due to the exact string name used with the option. I see your KSP option is Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 but then below I see 
>> 
>>> Coarse grid solver -- level -------------------------------
>>>   KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>>>     type: preonly
>> 
>> That is, it seems to be looking for an option without the sub. This is normal. For the coarsest level of multigrid it uses coarse otherwise it uses levels.  Sub is used for block methods in PETSc such as block Jacobi methods but that doesn't seem to apply in your run with one process. It is possible since the run is on one process without a method such as block Jacobi the sub is just not relevant.
> 
> There is something wrong with the option, IMHO, Eric is right in thinking that he should prepend sub_.
> The prefix is not being propagated properly, I’ll investigate.
> Here is a simple reproducer:
> $ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi // KO
> […]
>    linear system matrix = precond matrix:
>    Mat Object: 1 MPI processes
>      type: seqaij
> […]
> $ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type asm // OK
> […]
>    linear system matrix = precond matrix:
>    Mat Object: (sub_) 1 MPI processes
>      type: seqaij
> […]
> $ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type gasm // OK
> […]
>      linear system matrix = precond matrix:
>      Mat Object: (sub_) 1 MPI processes
>        type: seqaij
> […]
> $ mpirun -n 4 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi -pc_bjacobi_blocks 1 // OK
> […]
>    linear system matrix = precond matrix:
>    Mat Object: (sub_) 4 MPI processes
>      type: mpiaij
> […]
> 
> Eric, in the meantime, you can just put the MUMPS options in the global scope, i.e., -mat_mumps_icntl_24 1, but this will apply to all unprefixed MUMPS instances.
> 
> Thanks,
> Pierre
> 
>> You can run any code with your current options and with -help  and then grep for particular options that you may wish to add. Or you can run with -ts/snes/ksp_view to see the option prefixes needed for each inner solve. 
>> 
>> I am not sure how to make the code bullet proof in your situation. ideally it would explain why your options don't work but I am not sure if that is possible.
>> 
>> Barry
>> 
>> 
>> 
>> 
>> 
>>> On Mar 18, 2021, at 8:46 PM, Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> wrote:
>>> 
>>> Hi again,
>>> 
>>> ok, just saw that some matrices have lines of "0" in case of 3D hermite DOFs (ex: du/dz derivatives) when used into a 2D plane mesh...
>>> 
>>> So, my last problem about hypre smoother is "normal".
>>> 
>>> However, just to play with one of this matrix, I tried to do a "LU" with mumps icntl_24 option activated on the global system: fine it works.
>>> 
>>> Then I tried to switche to GAMG with mumps for the coarse_sub level, but it seems my icntl_24 option is then ignored and I don't know why...
>>> 
>>> See my KSP:
>>> 
>>> KSP Object: (Options_ProjectionL2_0) 1 MPI processes
>>> type: bcgs
>>> maximum iterations=10000, initial guess is zero
>>> tolerances:  relative=1e-15, absolute=1e-15, divergence=1e+12
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: (Options_ProjectionL2_0) 1 MPI processes
>>> type: gamg
>>>   type is MULTIPLICATIVE, levels=2 cycles=v
>>>     Cycles per PCApply=1
>>>     Using externally compute Galerkin coarse grid matrices
>>>     GAMG specific options
>>>       Threshold for dropping small values in graph on each level =
>>>       Threshold scaling factor for each level not specified = 1.
>>>       AGG specific options
>>>         Symmetric graph false
>>>         Number of levels to square graph 1
>>>         Number smoothing steps 1
>>>       Complexity:    grid = 1.09756
>>> Coarse grid solver -- level -------------------------------
>>>   KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>>>     type: preonly
>>>     maximum iterations=10000, initial guess is zero
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>     left preconditioning
>>>     using NONE norm type for convergence test
>>>   PC Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>>>     type: bjacobi
>>>       number of blocks = 1
>>>       Local solver is the same for all blocks, as in the following KSP and PC objects on rank 0:
>>>     KSP Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
>>>       type: preonly
>>>       maximum iterations=1, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
>>>       type: lu
>>>         out-of-place factorization
>>>         tolerance for zero pivot 2.22045e-14
>>>         using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>>         matrix ordering: nd
>>>         factor fill ratio given 0., needed 0.
>>>           Factored matrix follows:
>>>             Mat Object: 1 MPI processes
>>>               type: mumps
>>>               rows=8, cols=8
>>>               package used to perform factorization: mumps
>>>               total: nonzeros=64, allocated nonzeros=64
>>>                 MUMPS run parameters:
>>>                   SYM (matrix type):                   0
>>>                   PAR (host participation):            1
>>>                   ICNTL(1) (output for error):         6
>>>                   ICNTL(2) (output of diagnostic msg): 0
>>>                   ICNTL(3) (output for global info):   0
>>>                   ICNTL(4) (level of printing):        0
>>>                   ICNTL(5) (input mat struct):         0
>>>                   ICNTL(6) (matrix prescaling):        7
>>>                   ICNTL(7) (sequential matrix ordering):7
>>>                   ICNTL(8) (scaling strategy):        77
>>>                   ICNTL(10) (max num of refinements):  0
>>>                   ICNTL(11) (error analysis):          0
>>>                   ICNTL(12) (efficiency control):                         1
>>>                   ICNTL(13) (sequential factorization of the root node):  0
>>>                   ICNTL(14) (percentage of estimated workspace increase): 20
>>>                   ICNTL(18) (input mat struct):                           0
>>>                   ICNTL(19) (Schur complement info):                      0
>>>                   ICNTL(20) (RHS sparse pattern):                         0
>>>                   ICNTL(21) (solution struct):                            0
>>>                   ICNTL(22) (in-core/out-of-core facility):               0
>>>                   ICNTL(23) (max size of memory can be allocated locally):0
>>>                   ICNTL(24) (detection of null pivot rows):               0
>>>                   ICNTL(25) (computation of a null space basis):          0
>>>                   ICNTL(26) (Schur options for RHS or solution):          0
>>>                   ICNTL(27) (blocking size for multiple RHS):             -32
>>>                   ICNTL(28) (use parallel or sequential ordering):        1
>>>                   ICNTL(29) (parallel ordering):                          0
>>>                   ICNTL(30) (user-specified set of entries in inv(A)):    0
>>>                   ICNTL(31) (factors is discarded in the solve phase):    0
>>>                   ICNTL(33) (compute determinant):                        0
>>>                   ICNTL(35) (activate BLR based factorization):           0
>>>                   ICNTL(36) (choice of BLR factorization variant):        0
>>>                   ICNTL(38) (estimated compression rate of LU factors):   333
>>>                   CNTL(1) (relative pivoting threshold): 0.01
>>>                   CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>                   CNTL(3) (absolute pivoting threshold):      0.
>>>                   CNTL(4) (value of static pivoting): -1.
>>>                   CNTL(5) (fixation for null pivots):         0.
>>>                   CNTL(7) (dropping parameter for BLR):       0.
>>>                   RINFO(1) (local estimated flops for the elimination after analysis):
>>>                     [0] 308.
>>>                   RINFO(2) (local estimated flops for the assembly after factorization):
>>>                     [0]  0.
>>>                   RINFO(3) (local estimated flops for the elimination after factorization):
>>>                     [0]  0.
>>>                   INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization):
>>>                   [0] 0
>>>                   INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization):
>>>                     [0] 0
>>>                   INFO(23) (num of pivots eliminated on this processor after factorization):
>>>                     [0] 6
>>>                   RINFOG(1) (global estimated flops for the elimination after analysis): 308.
>>>                   RINFOG(2) (global estimated flops for the assembly after factorization): 0.
>>>                   RINFOG(3) (global estimated flops for the elimination after factorization): 0.
>>>                   (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
>>>                   INFOG(3) (estimated real workspace for factors on all processors after analysis): 64
>>>                   INFOG(4) (estimated integer workspace for factors on all processors after analysis): 35
>>>                   INFOG(5) (estimated maximum front size in the complete tree): 8
>>>                   INFOG(6) (number of nodes in the complete tree): 1
>>>                   INFOG(7) (ordering option effectively use after analysis): 2
>>>                   INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
>>>                   INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 64
>>>                   INFOG(10) (total integer space store the matrix factors after factorization): 35
>>>                   INFOG(11) (order of largest frontal matrix after factorization): 8
>>>                   INFOG(12) (number of off-diagonal pivots): 0
>>>                   INFOG(13) (number of delayed pivots after factorization): 0
>>>                   INFOG(14) (number of memory compress after factorization): 0
>>>                   INFOG(15) (number of steps of iterative refinement after solution): 0
>>>                   INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 0
>>>                   INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 0
>>>                   INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 0
>>>                   INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 0
>>>                   INFOG(20) (estimated number of entries in the factors): 64
>>>                   INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 0
>>>                   INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 0
>>>                   INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0
>>>                   INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
>>>                   INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
>>>                   INFOG(28) (after factorization: number of null pivots encountered): 0
>>>                   INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 0
>>>                   INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0
>>>                   INFOG(32) (after analysis: type of analysis done): 1
>>>                   INFOG(33) (value used for ICNTL(8)): 7
>>>                   INFOG(34) (exponent of the determinant if determinant is requested): 0
>>>                   INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 0
>>>                   INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
>>>                   INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
>>>                   INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
>>>                   INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
>>>       linear system matrix = precond matrix:
>>>       Mat Object: 1 MPI processes
>>>         type: seqaij
>>>         rows=8, cols=8, bs=4
>>>         total: nonzeros=64, allocated nonzeros=64
>>>         total number of mallocs used during MatSetValues calls=0
>>>           using I-node routines: found 2 nodes, limit used is 5
>>>     linear system matrix = precond matrix:
>>>     Mat Object: 1 MPI processes
>>>       type: seqaij
>>>       rows=8, cols=8, bs=4
>>>       total: nonzeros=64, allocated nonzeros=64
>>>       total number of mallocs used during MatSetValues calls=0
>>>         using I-node routines: found 2 nodes, limit used is 5
>>> Down solver (pre-smoother) on level 1 -------------------------------
>>>   KSP Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>>>     type: chebyshev
>>>       eigenvalue estimates used:  min = 0., max = 0.
>>>       eigenvalues estimate via gmres min 0., max 0.
>>>       eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
>>>       KSP Object: (Options_ProjectionL2_0mg_levels_1_esteig_) 1 MPI processes
>>>         type: gmres
>>>           restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>           happy breakdown tolerance 1e-30
>>>         maximum iterations=10, initial guess is zero
>>>         tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>>         left preconditioning
>>>         using PRECONDITIONED norm type for convergence test
>>>       PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>>>         type: sor
>>>           type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>>         linear system matrix = precond matrix:
>>>         Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>>>           type: seqaij
>>>           rows=36, cols=36, bs=4
>>>           total: nonzeros=656, allocated nonzeros=656
>>>           total number of mallocs used during MatSetValues calls=0
>>>             using I-node routines: found 9 nodes, limit used is 5
>>>       estimating eigenvalues using noisy right hand side
>>>     maximum iterations=2, nonzero initial guess
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>     left preconditioning
>>>     using NONE norm type for convergence test
>>>   PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>>>     type: sor
>>>       type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>>     linear system matrix = precond matrix:
>>>     Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>>>       type: seqaij
>>>       rows=36, cols=36, bs=4
>>>       total: nonzeros=656, allocated nonzeros=656
>>>       total number of mallocs used during MatSetValues calls=0
>>>         using I-node routines: found 9 nodes, limit used is 5
>>> Up solver (post-smoother) same as down solver (pre-smoother)
>>> linear system matrix = precond matrix:
>>> Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>>>   type: seqaij
>>>   rows=36, cols=36, bs=4
>>>   total: nonzeros=656, allocated nonzeros=656
>>>   total number of mallocs used during MatSetValues calls=0
>>>     using I-node routines: found 9 nodes, limit used is 5
>>> 
>>> but I have this option left:
>>> 
>>> Option left: name:-Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 value: 1
>>> 
>>> and as you can see above I end with:
>>> 
>>>                   ICNTL(24) (detection of null pivot rows):               0
>>> 
>>> which is fatal in my case...
>>> 
>>> Can you see where I did wrong?
>>> 
>>> Thanks,
>>> 
>>> Eric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210319/85903625/attachment-0001.html>


More information about the petsc-dev mailing list