[petsc-users] MatAssemblyEnd taking too long

Wed Aug 19 20:06:55 CDT 2020

> On Aug 19, 2020, at 7:56 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Manav Bhatia <bhatiamanav at gmail.com> writes:
> 
>> Thanks for the followup, Jed. 
>> 
>>> On Aug 19, 2020, at 7:42 PM, Jed Brown <jed at jedbrown.org> wrote:
>>> 
>>> Can you share a couple example stack traces from that debugging?  
>> 
>> Do you mean a similar screenshot at different system sizes? Or a different format? 
> 
> Sorry, I missed the screenshots (they were tucked away in the text/html and I was reading the text/plain version of your message).

Glad you found them. Please let me know if more information would help. 

> 
>>> About how many nonzeros per row?
>> 
>> This is a 3D elasticity run with Hex8 elements. So, each row has 81 non-zero entries, although I have not verified that (I will do so now). Is there a command line argument that will print this for the matrix? Although, on second thought that will not be printed unless the Assembly routine has finished. 
> 
> You could run a smaller problem size with -snes_view, which would show matrix stats.

Here is the information from a case with 2e6 DoFs.

KSP Object: 8 MPI processes
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: gamg
    type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using externally compute Galerkin coarse grid matrices
      GAMG specific options
        Threshold for dropping small values in graph on each level =   0.   0.   0.  
        Threshold scaling factor for each level not specified = 1.
        AGG specific options
          Symmetric graph false
          Number of levels to square graph 1
          Number smoothing steps 1
        Complexity:    grid = 1.16005
  Coarse grid solver -- level -------------------------------
    KSP Object: (mg_coarse_) 8 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_coarse_) 8 MPI processes
      type: bjacobi
        number of blocks = 8
        Local solve is same for all blocks, in the following KSP and PC objects:
      KSP Object: (mg_coarse_sub_) 1 MPI processes
        type: preonly
        maximum iterations=1, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (mg_coarse_sub_) 1 MPI processes
        type: lu
          out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5., needed 1.
            Factored matrix follows:
              Mat Object: 1 MPI processes
                type: seqaij
                rows=12, cols=12, bs=6
                package used to perform factorization: petsc
                total: nonzeros=144, allocated nonzeros=144
                total number of mallocs used during MatSetValues calls=0
                  using I-node routines: found 3 nodes, limit used is 5
        linear system matrix = precond matrix:
        Mat Object: 1 MPI processes
          type: seqaij
          rows=12, cols=12, bs=6
          total: nonzeros=144, allocated nonzeros=144
          total number of mallocs used during MatSetValues calls=0
            using I-node routines: found 3 nodes, limit used is 5
      linear system matrix = precond matrix:
      Mat Object: 8 MPI processes
        type: mpiaij
        rows=12, cols=12, bs=6
        total: nonzeros=144, allocated nonzeros=144
        total number of mallocs used during MatSetValues calls=0
          using I-node (on process 0) routines: found 3 nodes, limit used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object: (mg_levels_1_) 8 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.16303, max = 1.79333
        eigenvalues estimate via gmres min 0.0108937, max 1.6303
        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_1_esteig_) 8 MPI processes
          type: gmres
            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=4, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_1_) 8 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: 8 MPI processes
        type: mpiaij
        rows=240, cols=240, bs=6
        total: nonzeros=51912, allocated nonzeros=51912
        total number of mallocs used during MatSetValues calls=0
          using I-node (on process 0) routines: found 13 nodes, limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object: (mg_levels_2_) 8 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.146755, max = 1.6143
        eigenvalues estimate via gmres min 0.00483441, max 1.46755
        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_2_esteig_) 8 MPI processes
          type: gmres
            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=4, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_2_) 8 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: 8 MPI processes
        type: mpiaij
        rows=6336, cols=6336, bs=6
        total: nonzeros=3902760, allocated nonzeros=3902760
        total number of mallocs used during MatSetValues calls=0
          using nonscalable MatPtAP() implementation
          using I-node (on process 0) routines: found 228 nodes, limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object: (mg_levels_3_) 8 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.1525, max = 1.67751
        eigenvalues estimate via gmres min 0.0281517, max 1.525
        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_3_esteig_) 8 MPI processes
          type: gmres
            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=4, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_3_) 8 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: 8 MPI processes
        type: mpiaij
        rows=87246, cols=87246, bs=6
        total: nonzeros=21279420, allocated nonzeros=21279420
        total number of mallocs used during MatSetValues calls=0
          using nonscalable MatPtAP() implementation
          using I-node (on process 0) routines: found 3552 nodes, limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object: (mg_levels_4_) 8 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.160784, max = 1.76862
        eigenvalues estimate via gmres min 0.0293826, max 1.60784
        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_4_esteig_) 8 MPI processes
          type: gmres
            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=4, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_4_) 8 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: () 8 MPI processes
        type: mpiaij
        rows=2000103, cols=2000103, bs=3
        total: nonzeros=157666509, allocated nonzeros=160054056
        total number of mallocs used during MatSetValues calls=0
          has attached near null space
          using I-node (on process 0) routines: found 86672 nodes, limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object: () 8 MPI processes
    type: mpiaij
    rows=2000103, cols=2000103, bs=3
    total: nonzeros=157666509, allocated nonzeros=160054056
    total number of mallocs used during MatSetValues calls=0
      has attached near null space
      using I-node (on process 0) routines: found 86672 nodes, limit used is 5

> 
> Can you try running with -matstash_legacy?

Will do and report results shortly. 

> 
> What version of Open MPI is this?

This is MPI 4.0.1 installed using macports: 

InfiHorizon:opt manav$ mpiexec-openmpi-clang --version
mpiexec-openmpi-clang (OpenRTE) 4.0.1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200819/54da7033/attachment-0001.html>