<div dir="ltr">It is a RAR since this is Windows :)<div><br></div><div>Viktor, your system looks singular. Is it possible that you somehow have zero on the diagonal? That might make the</div><div>SOR a problem. You could replace that with Jacobi using</div><div><br></div><div> -mg_levels_pc_type jacobi</div><div><br></div><div> 0 KSP Residual norm 2.980664994991e+02<br> 0 KSP preconditioned resid norm 2.980664994991e+02 true resid norm 7.983356882620e+11 ||r(i)||/||b|| 1.000000000000e+00<br> 1 KSP Residual norm 1.650358505966e+01<br> 1 KSP preconditioned resid norm 1.650358505966e+01 true resid norm 4.601793132543e+12 ||r(i)||/||b|| 5.764233267037e+00<br> 2 KSP Residual norm 2.086911345353e+01<br> 2 KSP preconditioned resid norm 2.086911345353e+01 true resid norm 1.258153657657e+12 ||r(i)||/||b|| 1.575970705250e+00<br> 3 KSP Residual norm 1.909137523120e+01<br> 3 KSP preconditioned resid norm 1.909137523120e+01 true resid norm 2.179275269000e+12 ||r(i)||/||b|| 2.729773077969e+00<br></div><div><br></div><div>Mark, here is the solver</div><div><br></div><div>KSP Object: 1 MPI processes<br> type: cg<br> maximum iterations=100000, initial guess is zero<br> tolerances: relative=1e-08, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using PRECONDITIONED norm type for convergence test<br>PC Object: 1 MPI processes<br> type: gamg<br> type is MULTIPLICATIVE, levels=4 cycles=v<br> Cycles per PCApply=1<br> Using externally compute Galerkin coarse grid matrices<br> GAMG specific options<br> Threshold for dropping small values in graph on each level = 0. 0. 0. 0.<br> Threshold scaling factor for each level not specified = 1.<br> AGG specific options<br> Symmetric graph false<br> Number of levels to square graph 1<br> Number smoothing steps 1<br> Complexity: grid = 1.0042<br> Coarse grid solver -- level -------------------------------<br> KSP Object: (mg_coarse_) 1 MPI processes<br> type: preonly<br> maximum iterations=10000, initial guess is zero<br> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using NONE norm type for convergence test<br> PC Object: (mg_coarse_) 1 MPI processes<br> type: bjacobi<br> number of blocks = 1<br> Local solver information for first block is in the following KSP and PC objects on rank 0:<br> Use -mg_coarse_ksp_view ::ascii_info_detail to display information for all blocks<br> KSP Object: (mg_coarse_sub_) 1 MPI processes<br> type: preonly<br> maximum iterations=1, initial guess is zero<br> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using NONE norm type for convergence test<br> PC Object: (mg_coarse_sub_) 1 MPI processes<br> type: lu<br> out-of-place factorization<br> tolerance for zero pivot 2.22045e-14<br> using diagonal shift on blocks to prevent zero pivot [INBLOCKS]<br> matrix ordering: nd<br> factor fill ratio given 5., needed 1.19444<br> Factored matrix follows:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=36, cols=36<br> package used to perform factorization: petsc<br> total: nonzeros=774, allocated nonzeros=774<br> using I-node routines: found 22 nodes, limit used is 5<br> linear system matrix = precond matrix:<br> Mat Object: (mg_coarse_sub_) 1 MPI processes<br> type: seqaij<br> rows=36, cols=36<br> total: nonzeros=648, allocated nonzeros=648<br> total number of mallocs used during MatSetValues calls=0<br> not using I-node routines<br> linear system matrix = precond matrix:<br> Mat Object: (mg_coarse_sub_) 1 MPI processes<br> type: seqaij<br> rows=36, cols=36<br> total: nonzeros=648, allocated nonzeros=648<br> total number of mallocs used during MatSetValues calls=0<br> not using I-node routines<br> Down solver (pre-smoother) on level 1 -------------------------------<br> KSP Object: (mg_levels_1_) 1 MPI processes<br> type: chebyshev<br> eigenvalue estimates used: min = 0.0997354, max = 1.09709<br> eigenvalues estimate via gmres min 0.00372245, max 0.997354<br> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]<br> KSP Object: (mg_levels_1_esteig_) 1 MPI processes<br> type: gmres<br> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br> happy breakdown tolerance 1e-30<br> maximum iterations=10, initial guess is zero<br> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using PRECONDITIONED norm type for convergence test<br> estimating eigenvalues using noisy right hand side<br> maximum iterations=2, nonzero initial guess<br> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using NONE norm type for convergence test<br> PC Object: (mg_levels_1_) 1 MPI processes<br> type: sor<br> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.<br> linear system matrix = precond matrix:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=902, cols=902<br> total: nonzeros=66660, allocated nonzeros=66660<br> total number of mallocs used during MatSetValues calls=0<br> not using I-node routines<br> Up solver (post-smoother) same as down solver (pre-smoother)<br> Down solver (pre-smoother) on level 2 -------------------------------<br> KSP Object: (mg_levels_2_) 1 MPI processes<br> type: chebyshev<br> eigenvalue estimates used: min = 0.0994525, max = 1.09398<br> eigenvalues estimate via gmres min 0.0303095, max 0.994525<br> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]<br> KSP Object: (mg_levels_2_esteig_) 1 MPI processes<br> type: gmres<br> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br> happy breakdown tolerance 1e-30<br> maximum iterations=10, initial guess is zero<br> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using PRECONDITIONED norm type for convergence test<br> estimating eigenvalues using noisy right hand side<br> maximum iterations=2, nonzero initial guess<br> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using NONE norm type for convergence test<br> PC Object: (mg_levels_2_) 1 MPI processes<br> type: sor<br> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.<br> linear system matrix = precond matrix:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=12043, cols=12043<br> total: nonzeros=455611, allocated nonzeros=455611<br> total number of mallocs used during MatSetValues calls=0<br> not using I-node routines<br> Up solver (post-smoother) same as down solver (pre-smoother)<br> Down solver (pre-smoother) on level 3 -------------------------------<br> KSP Object: (mg_levels_3_) 1 MPI processes<br> type: chebyshev<br> eigenvalue estimates used: min = 0.0992144, max = 1.09136<br> eigenvalues estimate via gmres min 0.0222691, max 0.992144<br> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]<br> KSP Object: (mg_levels_3_esteig_) 1 MPI processes<br> type: gmres<br> restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement<br> happy breakdown tolerance 1e-30<br> maximum iterations=10, initial guess is zero<br> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using PRECONDITIONED norm type for convergence test<br> estimating eigenvalues using noisy right hand side<br> maximum iterations=2, nonzero initial guess<br> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using NONE norm type for convergence test<br> PC Object: (mg_levels_3_) 1 MPI processes<br> type: sor<br> type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.<br> linear system matrix = precond matrix:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=1600200, cols=1600200<br> total: nonzeros=124439742, allocated nonzeros=129616200<br> total number of mallocs used during MatSetValues calls=0<br> using I-node routines: found 533400 nodes, limit used is 5<br> Up solver (post-smoother) same as down solver (pre-smoother)<br> linear system matrix = precond matrix:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=1600200, cols=1600200<br> total: nonzeros=124439742, allocated nonzeros=129616200<br> total number of mallocs used during MatSetValues calls=0<br> using I-node routines: found 533400 nodes, limit used is 5<br></div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 3, 2021 at 10:56 AM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">That does not seem to be an ASCII file.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 3, 2021 at 10:48 AM Viktor Nazdrachev <<a href="mailto:numbersixvs@gmail.com" target="_blank">numbersixvs@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">Hello Mark and Matthew!</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">I attached log files for serial and parallel cases and corresponding information about GAMG preconditioner (using grep).
I have to notice, that assembling of global stiffness matrix in code was performed by MatSetValues subrotuine (not MatSetValuesBlocked) <br>
</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">!nnds – number of nodes</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">!dmn=3</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatCreate(Petsc_Comm_World,Mat_K,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatSetFromOptions(Mat_K,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">…</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">…</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">…<br>
do i=1,nels</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> indx=indxmap(indx,2) !find global indices for DOFs</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">end do</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">But nullspace vector was created using VecSetBlockSize subroutine.</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call VecSetBlockSize(Vec_NullSpace,dmn,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call VecSetUp(Vec_NullSpace,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call VecGetArrayF90(Vec_NullSpace,null_space,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">…</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">call MatSetNearNullSpace(Mat_K,matnull,ierr)</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">I suppose it can be one of the reasons of GAMG slow convergence.</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">So I attached log files for parallel run with «pure» GAMG precondtioner.</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Kind regards,</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Viktor Nazdrachev</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">R&D senior researcher</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Geosteering Technologies LLC</span></p></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">пт, 3 сент. 2021 г. в 15:11, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Fri, Sep 3, 2021 at 8:02 AM Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev <<a href="mailto:numbersixvs@gmail.com" target="_blank">numbersixvs@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Hello, Lawrence!<br>
Thank you for your response!</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">I attached log files (txt files with convergence
behavior and RAM usage log in separate txt files) and resulting table with
convergence investigation data(xls). Data for main non-regular grid with 500K cells
and heterogeneous properties are in 500K folder, whereas data for simple
uniform 125K cells grid with constant properties are in 125K folder. </span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"><br></span></p><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> On 1 Sep 2021, at 09:42, </i></span><i><span style="color:black">Наздрачёв</span></i><i><span style="color:black"> </span><span style="color:black">Виктор</span></i><i><span lang="EN-US" style="color:black"> <</span><span style="color:black"><a href="https://lists.mcs.anl.gov/mailman/listinfo/petsc-users" style="color:blue" target="_blank"><span lang="EN-US" style="color:black">numbersixvs at gmail.com</span></a></span></i><i><span lang="EN-US" style="color:black">> wrote:</span></i></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> </i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> I have a 3D elasticity problem with heterogeneous properties.</i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>What does your coefficient variation look like? How large is the contrast?</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Young modulus varies from 1 to 10 GPa, Poisson ratio varies
from 0.3 to 0.44 and density – from 1700 to 2600 kg/m^3.</span></p></div></blockquote><div><br></div><div>That is not too bad. Poorly shaped elements are the next thing to worry about. Try to keep the aspect ratio below 10 if possible.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).</i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> </i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.</i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>How many iterations do you have in serial (and then in parallel)?</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI – 680 iterations.</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">I attached log files for all simulations (txt files
with convergence behavior and RAM usage log in separate txt files) and
resulting table with convergence/memory usage data(xls). Data for main
non-regular grid with 500K cells and heterogeneous properties are in 500K
folder, whereas data for simple uniform 125K cells grid with constant
properties are in 125K folder. </span></p><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> I`ve also tried PCGAMG (agg) preconditioner with IC</i></span><i><span style="color:black">С</span></i><i><span lang="EN-US" style="color:black"> (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.</span></i></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>Does the number of iterates increase in parallel? Again, how many iterations do you have?</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="text-align:justify;margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt).</span></pre></div></blockquote><div><br></div><div>Again, do not use ICC. I am surprised to see such a large jump in iteration count, but get ICC off the table.</div><div><br></div><div>You will see variability in the iteration count with processor count with GAMG. As much as 10% +-. Maybe more (random) variability , but usually less.</div><div><br></div><div>You can decrease the memory a little, and the setup time a lot, by aggressively coarsening, at the expense of higher iteration counts. It's a balancing act.</div><div><br></div><div>You can run with the defaults, add '-info', grep on GAMG and send the ~30 lines of output if you want advice on parameters.</div></div></div></blockquote><div><br></div><div>Can you send the output of</div><div><br></div><div> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Thanks,</div><div>Mark</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?</i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning.</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Thanks for idea, but, unfortunately, ML cannot be
compiled with 64bit integers (It is extremely necessary to perform computation
on mesh with more than 10M cells).</span></p><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (</span><span style="color:black"><a href="https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS" style="color:blue" target="_blank"><span lang="EN-US" style="color:black">https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS</span></a></span><span lang="EN-US" style="color:black">) then you could try the geneo scheme offered by PCHPDDM.</span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">I found strange convergence behavior for HPDDM
preconditioner. For 1 MPI process BFBCG solver did not converged
(log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation was
successful (1018 to reach convergence, log_hpddm(bfbcg)_pchpddm_4_mpi.txt).</span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">But it should be mentioned that stiffness matrix was
created in AIJ format (our default matrix format in program).</span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Matrix conversion to MATIS format via MatConvert </span><span lang="EN-US" style="font-family:"Courier New";color:black">subroutine resulted in losing of convergence for both
serial and parallel run.</span><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"></span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-family:"Courier New";color:black"><br></span></p><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>><i> Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?</i></span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">> </span></pre><pre style="margin:0cm 0cm 0.0001pt 35.4pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black">>I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret.</span> </pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span></pre><pre style="margin:0cm 0cm 0.0001pt;font-size:10pt;font-family:"Courier New""><span lang="EN-US" style="color:black"> </span>Thanks, I`ll try to use a strong threshold only for coarse grids.</pre>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Kind regards,</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Viktor Nazdrachev</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">R&D senior researcher</span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p>
<p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">Geosteering Technologies LLC</span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black">
</span></p><p class="MsoNormal" style="margin:0cm;line-height:normal;font-size:11pt;font-family:Calibri,sans-serif"><span lang="EN-US" style="font-size:10pt;font-family:"Courier New";color:black"> </span></p></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <<a href="mailto:wence@gmx.li" target="_blank">wence@gmx.li</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
> On 1 Sep 2021, at 09:42, Наздрачёв Виктор <<a href="mailto:numbersixvs@gmail.com" target="_blank">numbersixvs@gmail.com</a>> wrote:<br>
> <br>
> I have a 3D elasticity problem with heterogeneous properties.<br>
<br>
What does your coefficient variation look like? How large is the contrast?<br>
<br>
> There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).<br>
> <br>
> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.<br>
<br>
How many iterations do you have in serial (and then in parallel)?<br>
<br>
> I`ve also tried PCGAMG (agg) preconditioner with ICС (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.<br>
<br>
Does the number of iterates increase in parallel? Again, how many iterations do you have?<br>
<br>
> Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?<br>
<br>
bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning.<br>
<br>
If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (<a href="https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS" rel="noreferrer" target="_blank">https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS</a>) then you could try the geneo scheme offered by PCHPDDM.<br>
<br>
> Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?<br>
<br>
I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret.<br>
<br>
Lawrence<br>
<br>
</blockquote></div>
</blockquote></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div>