[petsc-users] Parallel efficiency of the gmres solver with ASM
Lei Shi
stoneszone at gmail.com
Thu Jun 25 15:24:29 CDT 2015
Hi Matt,
Thanks for your suggestions. Here is the output from Stream test on one
node which has 20 cores. I run it up to 20. Attached are the dumped output
with your suggested options. Really appreciate your help!!!
Number of MPI processes 1
Function Rate (MB/s)
Copy: 13816.9372
Scale: 8020.1809
Add: 12762.3830
Triad: 11852.5016
Number of MPI processes 2
Function Rate (MB/s)
Copy: 22748.7681
Scale: 14081.4906
Add: 18998.4516
Triad: 18303.2494
Number of MPI processes 3
Function Rate (MB/s)
Copy: 34045.2510
Scale: 23410.9767
Add: 30320.2702
Triad: 30163.7977
Number of MPI processes 4
Function Rate (MB/s)
Copy: 36875.5349
Scale: 29440.1694
Add: 36971.1860
Triad: 37377.0103
Number of MPI processes 5
Function Rate (MB/s)
Copy: 32272.8763
Scale: 30316.3435
Add: 38022.0193
Triad: 38815.4830
Number of MPI processes 6
Function Rate (MB/s)
Copy: 35619.8925
Scale: 34457.5078
Add: 41419.3722
Triad: 35825.3621
Number of MPI processes 7
Function Rate (MB/s)
Copy: 55284.2420
Scale: 47706.8009
Add: 59076.4735
Triad: 61680.5559
Number of MPI processes 8
Function Rate (MB/s)
Copy: 44525.8901
Scale: 48949.9599
Add: 57437.7784
Triad: 56671.0593
Number of MPI processes 9
Function Rate (MB/s)
Copy: 34375.7364
Scale: 29507.5293
Add: 45405.3120
Triad: 39518.7559
Number of MPI processes 10
Function Rate (MB/s)
Copy: 34278.0415
Scale: 41721.7843
Add: 46642.2465
Triad: 45454.7000
Number of MPI processes 11
Function Rate (MB/s)
Copy: 38093.7244
Scale: 35147.2412
Add: 45047.0853
Triad: 44983.2013
Number of MPI processes 12
Function Rate (MB/s)
Copy: 39750.8760
Scale: 52038.0631
Add: 55552.9503
Triad: 54884.3839
Number of MPI processes 13
Function Rate (MB/s)
Copy: 60839.0248
Scale: 74143.7458
Add: 85545.3135
Triad: 85667.6551
Number of MPI processes 14
Function Rate (MB/s)
Copy: 37766.2343
Scale: 40279.1928
Add: 49992.8572
Triad: 50303.4809
Number of MPI processes 15
Function Rate (MB/s)
Copy: 49762.3670
Scale: 59077.8251
Add: 60407.9651
Triad: 61691.9456
Number of MPI processes 16
Function Rate (MB/s)
Copy: 31996.7169
Scale: 36962.4860
Add: 40183.5060
Triad: 41096.0512
Number of MPI processes 17
Function Rate (MB/s)
Copy: 36348.3839
Scale: 39108.6761
Add: 46853.4476
Triad: 47266.1778
Number of MPI processes 18
Function Rate (MB/s)
Copy: 40438.7558
Scale: 43195.5785
Add: 53063.4321
Triad: 53605.0293
Number of MPI processes 19
Function Rate (MB/s)
Copy: 30739.4908
Scale: 34280.8118
Add: 40710.5155
Triad: 43330.9503
Number of MPI processes 20
Function Rate (MB/s)
Copy: 37488.3777
Scale: 41791.8999
Add: 49518.9604
Triad: 48908.2677
------------------------------------------------
np speedup
1 1.0
2 1.54
3 2.54
4 3.15
5 3.27
6 3.02
7 5.2
8 4.78
9 3.33
10 3.84
11 3.8
12 4.63
13 7.23
14 4.24
15 5.2
16 3.47
17 3.99
18 4.52
19 3.66
20 4.13
Sincerely Yours,
Lei Shi
---------
On Thu, Jun 25, 2015 at 6:44 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Thu, Jun 25, 2015 at 5:51 AM, Lei Shi <stoneszone at gmail.com> wrote:
>
>> Hello,
>>
>
> 1) In order to understand this, we have to disentagle the various effect.
> First, run the STREAMS benchmark
>
> make NPMAX=4 streams
>
> This will tell you the maximum speedup you can expect on this machine.
>
> 2) For these test cases, also send the output of
>
> -ksp_view -ksp_converged_reason -ksp_monitor_true_residual
>
> Thanks,
>
> Matt
>
>
>> I'm trying to improve the parallel efficiency of gmres solve in my. In my
>> CFD solver, Petsc gmres is used to solve the linear system generated by the
>> Newton's method. To test its efficiency, I started with a very simple
>> inviscid subsonic 3D flow as the first testcase. The parallel efficiency of
>> gmres solve with asm as the preconditioner is very bad. The results are
>> from our latest cluster. Right now, I'm only looking at the wclock time of
>> the ksp_solve.
>>
>> 1. First I tested ASM with gmres and ilu 0 for the sub domain , the
>> cpu time of 2 cores is almost the same as the serial run. Here is the
>> options for this case
>>
>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50
>> -ksp_gmres_restart 30 -ksp_pc_side right
>> -pc_type asm -sub_ksp_type gmres -sub_ksp_rtol 0.001 -sub_ksp_atol 1e-30
>> -sub_ksp_max_it 1000 -sub_pc_type ilu -sub_pc_factor_levels 0
>> -sub_pc_factor_fill 1.9
>>
>> The iteration numbers increase a lot for parallel run.
>>
>> coresiterationserrpetsc solve wclock timespeedupefficiency121.15E-0411.95
>> 1252.05E-0210.51.010.50462.19E-027.641.390.34
>>
>>
>>
>>
>>
>>
>> 2. Then I tested ASM with ilu 0 as the preconditoner only, the cpu
>> time of 2 cores is better than the 1st test, but the speedup is still very
>> bad. Here is the options i'm using
>>
>> -ksp_type gmres -ksp_max_it 100 -ksp_rtol 1e-5 -ksp_atol 1e-50
>> -ksp_gmres_restart 30 -ksp_pc_side right
>> -pc_type asm -sub_pc_type ilu -sub_pc_factor_levels 0
>> -sub_pc_factor_fill 1.9
>>
>> coresiterationserrpetsc solve cpu timespeedupefficiency1104.54E-0410.6812
>> 119.55E-048.21.300.654123.59E-045.262.030.50
>>
>>
>>
>>
>>
>>
>> Those results are from a third order "DG" scheme with a very coarse 3D
>> mesh (480 elements). I believe I should get some speedups for this test
>> even on this coarse mesh.
>>
>> My question is why does the asm with a local solve take much longer
>> time than the asm as a preconditioner only? Also the accuracy is very bad
>> too I have tested changing the overlap of asm to 2, but make it even worse.
>>
>> If I used a larger mesh ~4000 elements, the 2nd case with asm as the
>> preconditioner gives me a better speedup, but still not very good.
>>
>>
>> coresiterationserrpetsc solve cpu timespeedupefficiency171.91E-0297.32127
>> 2.07E-0264.941.50.74472.61E-0236.972.60.65
>>
>>
>>
>> Attached are the log_summary dumped from petsc, any suggestions are
>> welcome. I really appreciate it.
>>
>>
>> Sincerely Yours,
>>
>> Lei Shi
>> ---------
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150625/96fc7829/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proc2_asm_sub_ksp.dat
Type: application/octet-stream
Size: 12839 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150625/96fc7829/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proc2_asm_pconly.dat
Type: application/octet-stream
Size: 13347 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150625/96fc7829/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proc1_asm_sub_ksp.dat
Type: application/octet-stream
Size: 12323 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150625/96fc7829/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: proc1_asm_pconly.dat
Type: application/octet-stream
Size: 13066 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150625/96fc7829/attachment-0007.obj>
More information about the petsc-users
mailing list