[petsc-users] GAMG scaling

Hong hzhang at mcs.anl.gov
Wed May 3 21:05:57 CDT 2017

I basically used 'runex56' and set '-ne' be compatible with np.
Then I used option
'-matptap_via scalable'
'-matptap_via hypre'
'-matptap_via nonscalable'

I attached a job script below.

In master branch, I set default as 'nonscalable' for small - medium size
matrices, and automatically switch to 'scalable' when matrix size gets

Petsc solver uses MatPtAP,  which does local RAP to reduce communication
and accelerate computation.
I suggest you simply use default setting. Let me know if you encounter


runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable >

runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre >

runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable >

runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125

On Wed, May 3, 2017 at 2:08 PM, Mark Adams <mfadams at lbl.gov> wrote:

> Hong,the input files do not seem to be accessible. What are the command
> line option? (I don't see a "rap" or "scale" in the source).
> On Wed, May 3, 2017 at 12:17 PM, Hong <hzhang at mcs.anl.gov> wrote:
>> Mark,
>> Below is the copy of my email sent to you on Feb 27:
>> I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has
>> small memory, 1GB/core):
>> - nonscalable PtAP: use an array of length PN to do dense axpy
>> - scalable PtAP:       do sparse axpy without use of PN array
>> - hypre PtAP.
>> The results are attached. Summary:
>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>> - scalable PtAP is 4x faster than hypre PtAP
>> - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> User can overwrite default.
>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>> MatPtAP                   3.6224e+01 (nonscalable for small mats,
>> scalable for larger ones)
>> scalable MatPtAP     4.6129e+01
>> hypre                        1.9389e+02
>> This work in on petsc-master. Give it a try. If you encounter any
>> problem, let me know.
>> Hong
>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>> (Hong), what is the current state of optimizing RAP for scaling?
>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we are
>>> working out performance problems. They are hitting problems at ~1.5B dof
>>> problems on a basic Cray (XC30 I think).
>>> Thanks,
>>> Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170503/ae3dcbde/attachment.html>

More information about the petsc-users mailing list