<div dir="ltr">Hi, Stefano, <div>Please see the attached for the information with 4 and 8 CPUs for the complex matrix.</div><div>I am solving Maxwell equations (Attahced) using 2nd-order Nedelec elements (two dofs each edge, and two dofs each face).</div><div>The computational domain consists of different mediums, e.g., vacuum and substrate (different permitivity).</div><div>The PML is used to truncate the computational domain, absorbing the outgoing wave and introducing complex numbers for the matrix.</div><div><br></div><div>Thanks a lot for your suggestions. I will try MUMPS. </div><div>For now, I just want to fiddle with Petsc's built-in features to know more about it. </div><div>Yes. 5000 is larger. Smaller value. e.g., 30, converges very slowly. </div><div><br></div><div>Thanks a lot. </div><div><br></div><div>Have a good weekend. </div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Aug 17, 2024 at 9:23 AM Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com">stefano.zampini@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Please include the output of -log_view -ksp_view -ksp_monitor to understand what's happening.<div><br></div><div><div>Can you please share the equations you are solving so we can provide suggestions on the solver configuration?</div><div>As I said, solving for Nedelec-type discretizations is challenging, and not for off-the-shelf, black box solvers</div><div><br></div><div>Below are some comments:</div><div><br></div><div><ul><li>You use a redundant SVD approach for the coarse solve, which can be inefficient if your coarse space grows. You can use a parallel direct solver like MUMPS (reconfigure with --download-mumps and use -pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)</li><li>Why use ILU for the Dirichlet problem and GAMG for the Neumann problem? With 8 processes and 300K total dofs, you will have around 40K dofs per process, which is ok for a direct solver like MUMPS (-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann). With Nedelec dofs and the sparsity pattern they induce,  I believe you can push to 80K dofs per process with good performance.</li><li>Why 5000 of restart for GMRES? It is highly inefficient to re-orthogonalize such a large set of vectors.</li></ul></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <<a href="mailto:liufield@gmail.com" target="_blank">liufield@gmail.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Dear Petsc developers, <div><br></div><div>Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with, </div><div><br></div><div><div>petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc -pc_bddc_coarse_redundant_pc_type svd   -ksp_error_if_not_converged -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view -pc_bddc_use_local_mat_graph 0  -pc_bddc_dirichlet_pc_type ilu -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10 -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view</div></div><div><br></div><div>Then I used 2 cases for strong scaling test. One case only involves real numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd case involves complex numbers  (tetra #: 95,336; dof #: 611,432)  due to PML. </div><div><br></div><div>Case 1: </div><div>cpu #                Time for 500 ksp steps (s)    Parallel efficiency     PCsetup time(s)</div><div>          2              234.7                                                                  3.12</div><div>          4              126.6                                     0.92                      1.62</div><div>          8              84.97                                     0.69                      1.26</div><div>However for Case 2, </div><div><div>cpu #                Time for 500 ksp steps (s)    Parallel efficiency   PCsetup time(s)</div><div>          2              584.5                                                                      8.61</div><div>          4              376.8                                    0.77                           6.56</div><div>          8              459.6                                    0.31                         66.47</div></div><div>For these 2 cases, I checked the time for PCsetup as an example. It seems 8 cpus for case 2 used too much time on PCsetup.</div><div>Do you have any ideas about what is going on here? </div><div><br></div><div>Thanks,</div><div>Xiaodong </div><div><br></div><div><br></div></div></div></div></div></div>

</blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature">Stefano</div>

</blockquote></div>