[petsc-users] GAMG scaling

Hong hzhang at mcs.anl.gov
Thu May 4 10:33:51 CDT 2017


Mark:
>
> I am not seeing these options with -help ...
>
Hmm, this might be a bug - I'll check it.
Hong


>
> On Wed, May 3, 2017 at 10:05 PM, Hong <hzhang at mcs.anl.gov> wrote:
>
>> I basically used 'runex56' and set '-ne' be compatible with np.
>> Then I used option
>> '-matptap_via scalable'
>> '-matptap_via hypre'
>> '-matptap_via nonscalable'
>>
>> I attached a job script below.
>>
>> In master branch, I set default as 'nonscalable' for small - medium size
>> matrices, and automatically switch to 'scalable' when matrix size gets
>> larger.
>>
>> Petsc solver uses MatPtAP,  which does local RAP to reduce communication
>> and accelerate computation.
>> I suggest you simply use default setting. Let me know if you encounter
>> trouble.
>>
>> Hong
>>
>> job.ne174.n8.np125.sh:
>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>> -pc_gamg_repartition false -pc_mg_cycle_type v
>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable >
>> log.ne174.n8.np125.scalable
>>
>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>> -pc_gamg_repartition false -pc_mg_cycle_type v
>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre >
>> log.ne174.n8.np125.hypre
>>
>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>> -pc_gamg_repartition false -pc_mg_cycle_type v
>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable >
>> log.ne174.n8.np125.nonscalable
>>
>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>> -pc_gamg_repartition false -pc_mg_cycle_type v
>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>> -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125
>>
>> On Wed, May 3, 2017 at 2:08 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Hong,the input files do not seem to be accessible. What are the command
>>> line option? (I don't see a "rap" or "scale" in the source).
>>>
>>>
>>>
>>> On Wed, May 3, 2017 at 12:17 PM, Hong <hzhang at mcs.anl.gov> wrote:
>>>
>>>> Mark,
>>>> Below is the copy of my email sent to you on Feb 27:
>>>>
>>>> I implemented scalable MatPtAP and did comparisons of three
>>>> implementations using ex56.c on alcf cetus machine (this machine has
>>>> small memory, 1GB/core):
>>>> - nonscalable PtAP: use an array of length PN to do dense axpy
>>>> - scalable PtAP:       do sparse axpy without use of PN array
>>>> - hypre PtAP.
>>>>
>>>> The results are attached. Summary:
>>>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>>>> - scalable PtAP is 4x faster than hypre PtAP
>>>> - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>>
>>>> Based on above observation, I set the default PtAP algorithm as
>>>> 'nonscalable'.
>>>> When PN > local estimated nonzero of C=PtAP, then switch default to
>>>> 'scalable'.
>>>> User can overwrite default.
>>>>
>>>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>>>> MatPtAP                   3.6224e+01 (nonscalable for small mats,
>>>> scalable for larger ones)
>>>> scalable MatPtAP     4.6129e+01
>>>> hypre                        1.9389e+02
>>>>
>>>> This work in on petsc-master. Give it a try. If you encounter any
>>>> problem, let me know.
>>>>
>>>> Hong
>>>>
>>>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> (Hong), what is the current state of optimizing RAP for scaling?
>>>>>
>>>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we
>>>>> are working out performance problems. They are hitting problems at ~1.5B
>>>>> dof problems on a basic Cray (XC30 I think).
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170504/08483941/attachment.html>


More information about the petsc-users mailing list