[petsc-users] Strong scaling concerns for PCBDDC with Vector FEM
Stefano Zampini
stefano.zampini at gmail.com
Sat Aug 17 08:23:22 CDT 2024
Please include the output of -log_view -ksp_view -ksp_monitor to understand
what's happening.
Can you please share the equations you are solving so we can provide
suggestions on the solver configuration?
As I said, solving for Nedelec-type discretizations is challenging, and not
for off-the-shelf, black box solvers
Below are some comments:
- You use a redundant SVD approach for the coarse solve, which can be
inefficient if your coarse space grows. You can use a parallel direct
solver like MUMPS (reconfigure with --download-mumps and use
-pc_bddc_coarse_pc_type lu -pc_bddc_coarse_pc_factor_mat_solver_type mumps)
- Why use ILU for the Dirichlet problem and GAMG for the Neumann
problem? With 8 processes and 300K total dofs, you will have around 40K
dofs per process, which is ok for a direct solver like MUMPS
(-pc_bddc_dirichlet_pc_factor_mat_solver_type mumps, same for Neumann).
With Nedelec dofs and the sparsity pattern they induce, I believe you can
push to 80K dofs per process with good performance.
- Why 5000 of restart for GMRES? It is highly inefficient to
re-orthogonalize such a large set of vectors.
Il giorno ven 16 ago 2024 alle ore 00:04 neil liu <liufield at gmail.com> ha
scritto:
> Dear Petsc developers,
>
> Thanks for your previous help. Now, the PCBDDC can converge to 1e-8 with,
>
> petsc-3.21.1/petsc/arch-linux-c-opt/bin/mpirun -n 8 ./app -pc_type bddc
> -pc_bddc_coarse_redundant_pc_type svd -ksp_error_if_not_converged
> -mat_type is -ksp_monitor -ksp_rtol 1e-8 -ksp_gmres_restart 5000 -ksp_view
> -pc_bddc_use_local_mat_graph 0 -pc_bddc_dirichlet_pc_type ilu
> -pc_bddc_neumann_pc_type gamg -pc_bddc_neumann_pc_gamg_esteig_ksp_max_it 10
> -ksp_converged_reason -pc_bddc_neumann_approximate -ksp_max_it 500 -log_view
>
> Then I used 2 cases for strong scaling test. One case only involves real
> numbers (tetra #: 49,152; dof #: 324, 224 ) for matrix and rhs. The 2nd
> case involves complex numbers (tetra #: 95,336; dof #: 611,432) due to
> PML.
>
> Case 1:
> cpu # Time for 500 ksp steps (s) Parallel efficiency
> PCsetup time(s)
> 2 234.7
> 3.12
> 4 126.6 0.92
> 1.62
> 8 84.97 0.69
> 1.26
> However for Case 2,
> cpu # Time for 500 ksp steps (s) Parallel efficiency
> PCsetup time(s)
> 2 584.5
> 8.61
> 4 376.8 0.77
> 6.56
> 8 459.6 0.31
> 66.47
> For these 2 cases, I checked the time for PCsetup as an example. It seems
> 8 cpus for case 2 used too much time on PCsetup.
> Do you have any ideas about what is going on here?
>
> Thanks,
> Xiaodong
>
>
>
--
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240817/f4327a4f/attachment.html>
More information about the petsc-users
mailing list