[petsc-users] Parallel efficiency of the gmres solver with ASM

Thu Jun 25 06:44:17 CDT 2015

On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi <stoneszone at gmail.com> wrote:

> Hello,
>

1) In order to understand this, we have to disentagle the various effect.
First, run the STREAMS benchmark

  make NPMAX=4 streams

This will tell you the maximum speedup you can expect on this machine.

2) For these test cases, also send the output of

  -ksp_view -ksp_converged_reason -ksp_monitor_true_residual

  Thanks,

     Matt

> I'm trying to improve the parallel efficiency of gmres solve in my. In my
> CFD solver, Petsc gmres is used to solve the linear system generated by the
> Newton's method. To test its efficiency, I started with a very simple
> inviscid subsonic 3D flow as the first testcase. The parallel efficiency of
> gmres solve with asm as the preconditioner is very bad. The results are
> from our latest cluster. Right now, I'm only looking at the wclock time of
> the ksp_solve.
>
>    1. First I tested ASM with gmres and ilu 0 for the sub domain , the
>    cpu time of 2 cores is almost the same as the serial run. Here is the
>    options for this case
>
> -ksp_type gmres  -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50
> -ksp_gmres_restart 30 -ksp_pc_side right
> -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30
> -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0
> -sub_pc_factor_fill 1.9
>
> The iteration numbers increase a lot for parallel run.
>
> coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.951
> 252.05E-0210.51.010.50462.19E-027.641.390.34
>
>
>
>
>
>
>       2.  Then I tested ASM with ilu 0 as the preconditoner only, the cpu
> time of 2 cores is better than the 1st test, but the speedup is still very
> bad. Here is the options i'm using
>
> -ksp_type gmres  -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50
> -ksp_gmres_restart 30 -ksp_pc_side right
> -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0  -sub_pc_factor_fill
> 1.9
>
> coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.6812
> 119.55E-048.21.300.654123.59E-045.262.030.50
>
>
>
>
>
>
>    Those results are from a third order "DG" scheme with a very coarse 3D
> mesh (480 elements). I believe I should get some speedups for this test
> even on this coarse mesh.
>
>   My question is why does the asm with a local solve take much longer time
> than the asm as a preconditioner only? Also the accuracy is very bad too I
> have tested changing the overlap of asm to 2, but make it even worse.
>
>   If I used a larger mesh ~4000 elements, the 2nd case with asm as the
> preconditioner gives me a better speedup, but still not very good.
>
>
> coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.32127
> 2.07E-0264.941.50.74472.61E-0236.972.60.65
>
>
>
> Attached are the log_summary dumped from petsc, any suggestions are
> welcome. I really appreciate it.
>
>
> Sincerely Yours,
>
> Lei Shi
> ---------
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150625/e17a7a75/attachment.html>