[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems

Kong, Fande fande.kong at inl.gov
Tue Jun 26 15:54:21 CDT 2018


First of all, "-pc_type hypre" does not take "-mg_levels_*". The parameters
you set may not affect anything.

And Hypre with the default parameters set by PETSc does not scale well as
far as we know. We find the following parameters are important:

-pc_hypre_boomeramg_strong_threshold

-pc_hypre_boomeramg_max_levels

-pc_hypre_boomeramg_coarsen_type

-pc_hypre_boomeramg_agg_nl

-pc_hypre_boomeramg_agg_num_paths

You could go into the hypre user manual for more details on these
parameters.

Fande,

On Tue, Jun 26, 2018 at 2:25 PM, Junchao Zhang <jczhang at mcs.anl.gov> wrote:

> Mark,
>   I re-do the -pc_type hypre experiment without openmp.  Now the job
> finishes instead of running out of time. I have results with 216 processors
> (see below). The 1728-processor job is still in the queue so I don't know
> how it scales. But for the 216-processor one, the execution time is 245
> seconds. With -pc_type gamg, the time is 107 seconds.  My options are
>
> -ksp_norm_type unpreconditioned
> -ksp_rtol 1E-6
> -ksp_type cg
> -log_view
> -mesh_size 1E-4
> -mg_levels_esteig_ksp_max_it 10
> -mg_levels_esteig_ksp_type cg
> -mg_levels_ksp_max_it 1
> -mg_levels_ksp_norm_type none
> -mg_levels_ksp_type richardson
> -mg_levels_pc_sor_its 1
> -mg_levels_pc_type sor
> -nodes_per_proc 30
> -pc_type hypre
>
>
> It is a 7-point stencil code. Do you know other hypre options that I can
> try to improve it?  Thanks.
>
> --- Event Stage 2: Remaining Solves
>
> KSPSolve            1000 1.0 2.4574e+02 1.0 4.48e+09 1.0 7.6e+06 7.2e+03
> 2.0e+04 97100100100100 100100100100100  3928
> VecTDot            12000 1.0 6.5646e+00 2.2 6.48e+08 1.0 0.0e+00 0.0e+00
> 1.2e+04  2 14  0  0 60   2 14  0  0 60 21321
> VecNorm             8000 1.0 9.7144e-01 1.2 4.32e+08 1.0 0.0e+00 0.0e+00
> 8.0e+03  0 10  0  0 40   0 10  0  0 40 96055
> VecCopy             1000 1.0 7.9706e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              6000 1.0 1.7941e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY            12000 1.0 7.5738e-01 1.2 6.48e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 14  0  0  0   0 14  0  0  0 184806
> VecAYPX             6000 1.0 4.6802e-01 1.3 2.97e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  7  0  0  0   0  7  0  0  0 137071
> VecScatterBegin     7000 1.0 4.7924e-01 2.3 0.00e+00 0.0 7.6e+06 7.2e+03
> 0.0e+00  0  0100100  0   0  0100100  0     0
> VecScatterEnd       7000 1.0 7.9303e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult             7000 1.0 6.0762e+00 1.1 2.46e+09 1.0 7.6e+06 7.2e+03
> 0.0e+00  2 55100100  0   2 55100100  0 86894
> PCApply             6000 1.0 2.3429e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 92  0  0  0  0  95  0  0  0  0     0
>
>
> --Junchao Zhang
>
> On Thu, Jun 14, 2018 at 5:45 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> I tested -pc_gamg_repartition with 216 processors again. First I tested
>> with these options
>>
>> -log_view \
>> -ksp_rtol 1E-6 \
>> -ksp_type cg \
>> -ksp_norm_type unpreconditioned \
>> -mg_levels_ksp_type richardson \
>> -mg_levels_ksp_norm_type none \
>> -mg_levels_pc_type sor \
>> -mg_levels_ksp_max_it 1 \
>> -mg_levels_pc_sor_its 1 \
>> -mg_levels_esteig_ksp_type cg \
>> -mg_levels_esteig_ksp_max_it 10 \
>> -pc_type gamg \
>> -pc_gamg_type agg \
>> -pc_gamg_threshold 0.05 \
>> -pc_gamg_type classical \
>> -gamg_est_ksp_type cg \
>> -pc_gamg_square_graph 10 \
>> -pc_gamg_threshold 0.0
>>
>>
>> then I tested with an extra -pc_gamg_repartition. With repartition, the
>> time increased from 120s to 140s.  The code measures first KSPSolve and
>> the remaining in separate stages, so the repartition time was not counted
>> in the stage of interest. Actually, log_view says GMAG :repartition time
>> (in the first event stage) is about 1.5 sec., so it is not a big deal. I
>> also tested -pc_gamg_square_graph 4. It did not change the time.
>> I tested hypre with options "-log_view -ksp_rtol 1E-6 -ksp_type cg
>> -ksp_norm_type unpreconditioned -pc_type hypre"  and nothing else. The code
>> ran out of time. In old tests, a job (1000 KSPSolve with 7 KSP iterations
>> each) took 4 minutes. With hypre, 1 KSPSolve + 6 KSP iterations each, takes
>> 6 minutes.
>> I will test and profile the code on a single node, and apply some
>> vecscatter optimizations I recently did to see what happens.
>>
>>
>> --Junchao Zhang
>>
>> On Thu, Jun 14, 2018 at 11:03 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> And with 7-point stensils and no large material discontinuities you
>>> probably want -pc_gamg_square_graph 10 -pc_gamg_threshold 0.0 and you
>>> could test the square graph parameter (eg, 1,2,3,4).
>>>
>>> And I would definitely test hypre.
>>>
>>> On Thu, Jun 14, 2018 at 8:54 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>>
>>>>> Just -pc_type hypre instead of -pc_type gamg.
>>>>>
>>>>>
>>>> And you need to have configured PETSc with hypre.
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180626/96dee96b/attachment-0001.html>


More information about the petsc-dev mailing list