[petsc-users] MatAssemblyEnd taking too long
Manav Bhatia
bhatiamanav at gmail.com
Wed Aug 19 20:06:55 CDT 2020
> On Aug 19, 2020, at 7:56 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Manav Bhatia <bhatiamanav at gmail.com> writes:
>
>> Thanks for the followup, Jed.
>>
>>> On Aug 19, 2020, at 7:42 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>
>>> Can you share a couple example stack traces from that debugging?
>>
>> Do you mean a similar screenshot at different system sizes? Or a different format?
>
> Sorry, I missed the screenshots (they were tucked away in the text/html and I was reading the text/plain version of your message).
Glad you found them. Please let me know if more information would help.
>
>>> About how many nonzeros per row?
>>
>> This is a 3D elasticity run with Hex8 elements. So, each row has 81 non-zero entries, although I have not verified that (I will do so now). Is there a command line argument that will print this for the matrix? Although, on second thought that will not be printed unless the Assembly routine has finished.
>
> You could run a smaller problem size with -snes_view, which would show matrix stats.
Here is the information from a case with 2e6 DoFs.
KSP Object: 8 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=5 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0. 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 1
Number smoothing steps 1
Complexity: grid = 1.16005
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 8 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 8 MPI processes
type: bjacobi
number of blocks = 8
Local solve is same for all blocks, in the following KSP and PC objects:
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=12, cols=12, bs=6
package used to perform factorization: petsc
total: nonzeros=144, allocated nonzeros=144
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 3 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=12, cols=12, bs=6
total: nonzeros=144, allocated nonzeros=144
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 3 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaij
rows=12, cols=12, bs=6
total: nonzeros=144, allocated nonzeros=144
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 3 nodes, limit used is 5
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 8 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.16303, max = 1.79333
eigenvalues estimate via gmres min 0.0108937, max 1.6303
eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_1_esteig_) 8 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=4, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 8 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaij
rows=240, cols=240, bs=6
total: nonzeros=51912, allocated nonzeros=51912
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 13 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 8 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.146755, max = 1.6143
eigenvalues estimate via gmres min 0.00483441, max 1.46755
eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_2_esteig_) 8 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=4, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 8 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaij
rows=6336, cols=6336, bs=6
total: nonzeros=3902760, allocated nonzeros=3902760
total number of mallocs used during MatSetValues calls=0
using nonscalable MatPtAP() implementation
using I-node (on process 0) routines: found 228 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 8 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.1525, max = 1.67751
eigenvalues estimate via gmres min 0.0281517, max 1.525
eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_3_esteig_) 8 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=4, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 8 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaij
rows=87246, cols=87246, bs=6
total: nonzeros=21279420, allocated nonzeros=21279420
total number of mallocs used during MatSetValues calls=0
using nonscalable MatPtAP() implementation
using I-node (on process 0) routines: found 3552 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 4 -------------------------------
KSP Object: (mg_levels_4_) 8 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.160784, max = 1.76862
eigenvalues estimate via gmres min 0.0293826, max 1.60784
eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]
KSP Object: (mg_levels_4_esteig_) 8 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=4, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_4_) 8 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
linear system matrix = precond matrix:
Mat Object: () 8 MPI processes
type: mpiaij
rows=2000103, cols=2000103, bs=3
total: nonzeros=157666509, allocated nonzeros=160054056
total number of mallocs used during MatSetValues calls=0
has attached near null space
using I-node (on process 0) routines: found 86672 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: () 8 MPI processes
type: mpiaij
rows=2000103, cols=2000103, bs=3
total: nonzeros=157666509, allocated nonzeros=160054056
total number of mallocs used during MatSetValues calls=0
has attached near null space
using I-node (on process 0) routines: found 86672 nodes, limit used is 5
>
> Can you try running with -matstash_legacy?
Will do and report results shortly.
>
> What version of Open MPI is this?
This is MPI 4.0.1 installed using macports:
InfiHorizon:opt manav$ mpiexec-openmpi-clang --version
mpiexec-openmpi-clang (OpenRTE) 4.0.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200819/54da7033/attachment-0001.html>
More information about the petsc-users
mailing list