[petsc-users] Slow convergence while parallel computations.

Viktor Nazdrachev numbersixvs at gmail.com
Fri Sep 3 00:55:59 CDT 2021


Hello, Lawrence!
Thank you for your response!

I attached log files (txt files with convergence behavior and RAM usage log
in separate txt files) and resulting table with convergence investigation
data(xls). Data for main non-regular grid with 500K cells and heterogeneous
properties are in 500K folder, whereas data for simple uniform 125K cells
grid with constant properties are in 125K folder.


>>* On 1 Sep 2021, at 09:42, **Наздрачёв** Виктор** <numbersixvs at gmail.com <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>**> wrote:*

>>

>>* I have a 3D elasticity problem with heterogeneous properties.*

>

>What does your coefficient variation look like? How large is the contrast?



Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to
0.44 and density – from 1700 to 2600 kg/m^3.





>>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs  are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).*

>>

>>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.*

>

>How many iterations do you have in serial (and then in parallel)?



Serial run is required 112 iterations to reach convergence
(log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI –
680 iterations.



I attached log files for all simulations (txt files with convergence
behavior and RAM usage log in separate txt files) and resulting table with
convergence/memory usage data(xls). Data for main non-regular grid with
500K cells and heterogeneous properties are in 500K folder, whereas data
for simple uniform 125K cells grid with constant properties are in 125K
folder.





>>* I`ve also tried PCGAMG (agg) preconditioner with IC**С** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.  This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.*

>

>Does the number of iterates increase in parallel? Again, how many iterations do you have?



For case with 4 MPI processes and attached nullspace it is required
177 iterations to reach convergence (you may see detailed log in
log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90
iterations are required for sequential
run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt).







>>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?*

>

>bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning.



Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit
integers (It is extremely necessary to perform computation on mesh with
more than 10M cells).





>If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM.





I found strange convergence behavior for HPDDM preconditioner. For 1 MPI
process BFBCG solver did not converged
(log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation
was successful (1018 to reach convergence,
log_hpddm(bfbcg)_pchpddm_4_mpi.txt).

But it should be mentioned that stiffness matrix was created in AIJ format
(our default matrix format in program).

Matrix conversion to MATIS format via MatConvert subroutine resulted in
losing of convergence for both serial and parallel run.


>>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?*

>

>I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret.



 Thanks, I`ll try to use a strong threshold only for coarse grids.



Kind regards,



Viktor Nazdrachev



R&D senior researcher



Geosteering Technologies LLC









ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <wence at gmx.li>:

>
>
> > On 1 Sep 2021, at 09:42, Наздрачёв Виктор <numbersixvs at gmail.com> wrote:
> >
> > I have a 3D elasticity problem with heterogeneous properties.
>
> What does your coefficient variation look like? How large is the contrast?
>
> > There is unstructured grid with aspect ratio varied from 4 to 25. Zero
> Dirichlet BCs  are imposed on bottom face of mesh. Also, Neumann (traction)
> BCs are imposed on side faces. Gravity load is also accounted for. The grid
> I use consists of 500k cells (which is approximately 1.6M of DOFs).
> >
> > The best performance and memory usage for single MPI process was
> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as
> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with
> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of
> number of iterations required to achieve the same tolerance is
> significantly increased.
>
> How many iterations do you have in serial (and then in parallel)?
>
> > I`ve also tried PCGAMG (agg) preconditioner with ICС (1)
> sub-precondtioner. For single MPI process, the calculation took 10 min and
> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached
> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.
> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM.
> Also, there is peak memory usage with 14.1 GB, which appears just before
> the start of the iterations. Parallel computation with 4 MPI processes took
> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is
> about 22 GB.
>
> Does the number of iterates increase in parallel? Again, how many
> iterations do you have?
>
> > Are there ways to avoid decreasing of the convergence rate for bjacobi
> precondtioner in parallel mode? Does it make sense to use hierarchical or
> nested krylov methods with a local gmres solver (sub_pc_type gmres) and
> some sub-precondtioner (for example, sub_pc_type bjacobi)?
>
> bjacobi is only a one-level method, so you would not expect
> process-independent convergence rate for this kind of problem. If the
> coefficient variation is not too extreme, then I would expect GAMG (or some
> other smoothed aggregation package, perhaps -pc_type ml (you need
> --download-ml)) would work well with some tuning.
>
> If you have extremely high contrast coefficients you might need something
> with stronger coarse grids. If you can assemble so-called Neumann matrices (
> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you
> could try the geneo scheme offered by PCHPDDM.
>
> > Is this peak memory usage expected for gamg preconditioner? is there any
> way to reduce it?
>
> I think that peak memory usage comes from building the coarse grids. Can
> you run with `-info` and grep for GAMG, this will provide some output that
> more expert GAMG users can interpret.
>
> Lawrence
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210903/ff1f6d12/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.rar
Type: application/octet-stream
Size: 212693 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210903/ff1f6d12/attachment-0001.obj>


More information about the petsc-users mailing list