[petsc-users] GAMG scaling

Thu May 4 14:33:11 CDT 2017

Mark,
Fixed
https://bitbucket.org/petsc/petsc/commits/68eacb73b84ae7f3fd7363217d47f23a8f967155

Run ex56 gives
mpiexec -n 8 ./ex56 -ne 13 ... -h |grep via
  -mattransposematmult_via <scalable> Algorithmic approach (choose one of)
scalable nonscalable matmatmult (MatTransposeMatMult)
  -matmatmult_via <nonscalable> Algorithmic approach (choose one of)
scalable nonscalable hypre (MatMatMult)
  -matptap_via <nonscalable> Algorithmic approach (choose one of) scalable
nonscalable hypre (MatPtAP)
...

I'll merge it to master after regression tests.

Hong

On Thu, May 4, 2017 at 10:33 AM, Hong <hzhang at mcs.anl.gov> wrote:

> Mark:
>>
>> I am not seeing these options with -help ...
>>
> Hmm, this might be a bug - I'll check it.
> Hong
>
>
>>
>> On Wed, May 3, 2017 at 10:05 PM, Hong <hzhang at mcs.anl.gov> wrote:
>>
>>> I basically used 'runex56' and set '-ne' be compatible with np.
>>> Then I used option
>>> '-matptap_via scalable'
>>> '-matptap_via hypre'
>>> '-matptap_via nonscalable'
>>>
>>> I attached a job script below.
>>>
>>> In master branch, I set default as 'nonscalable' for small - medium size
>>> matrices, and automatically switch to 'scalable' when matrix size gets
>>> larger.
>>>
>>> Petsc solver uses MatPtAP,  which does local RAP to reduce communication
>>> and accelerate computation.
>>> I suggest you simply use default setting. Let me know if you encounter
>>> trouble.
>>>
>>> Hong
>>>
>>> job.ne174.n8.np125.sh:
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable >
>>> log.ne174.n8.np125.scalable
>>>
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre >
>>> log.ne174.n8.np125.hypre
>>>
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable >
>>> log.ne174.n8.np125.nonscalable
>>>
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125
>>>
>>> On Wed, May 3, 2017 at 2:08 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> Hong,the input files do not seem to be accessible. What are the command
>>>> line option? (I don't see a "rap" or "scale" in the source).
>>>>
>>>>
>>>>
>>>> On Wed, May 3, 2017 at 12:17 PM, Hong <hzhang at mcs.anl.gov> wrote:
>>>>
>>>>> Mark,
>>>>> Below is the copy of my email sent to you on Feb 27:
>>>>>
>>>>> I implemented scalable MatPtAP and did comparisons of three
>>>>> implementations using ex56.c on alcf cetus machine (this machine has
>>>>> small memory, 1GB/core):
>>>>> - nonscalable PtAP: use an array of length PN to do dense axpy
>>>>> - scalable PtAP:       do sparse axpy without use of PN array
>>>>> - hypre PtAP.
>>>>>
>>>>> The results are attached. Summary:
>>>>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>>>>> PtAP
>>>>> - scalable PtAP is 4x faster than hypre PtAP
>>>>> - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>>>
>>>>> Based on above observation, I set the default PtAP algorithm as
>>>>> 'nonscalable'.
>>>>> When PN > local estimated nonzero of C=PtAP, then switch default to
>>>>> 'scalable'.
>>>>> User can overwrite default.
>>>>>
>>>>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>>>>> MatPtAP                   3.6224e+01 (nonscalable for small mats,
>>>>> scalable for larger ones)
>>>>> scalable MatPtAP     4.6129e+01
>>>>> hypre                        1.9389e+02
>>>>>
>>>>> This work in on petsc-master. Give it a try. If you encounter any
>>>>> problem, let me know.
>>>>>
>>>>> Hong
>>>>>
>>>>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> (Hong), what is the current state of optimizing RAP for scaling?
>>>>>>
>>>>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we
>>>>>> are working out performance problems. They are hitting problems at ~1.5B
>>>>>> dof problems on a basic Cray (XC30 I think).
>>>>>>
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170504/d2bf2f33/attachment-0001.html>