[petsc-dev] PETSc amg solver with gpu seems run slowly

Barry Smith bsmith at petsc.dev
Tue Mar 22 11:29:51 CDT 2022


Indeed PCSetUp is taking most of the time (79%). In the version of PETSc you are running it is doing a great deal of the setup work on the CPU. You can see there is a lot of data movement between the CPU and GPU (in both directions) during the setup; 64 1.91e+03   54 1.21e+03 90

Clearly, we need help in porting all the parts of the GAMG setup that still occur on the CPU to the GPU.

 Barry




> On Mar 22, 2022, at 12:07 PM, Qi Yang <qiyang at oakland.edu> wrote:
> 
> Dear Barry,
> 
> Your advice is helpful, now the total time reduce from 30s to 20s(now all matrix run on gpu), actually I have tried other settings for amg predicontioner, seems not help that a lot, like  -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale  0.5.
> it seems the key point is the PCSetup process, from the log, it takes the most time, and we can find from the new nsight system analysis, there is a big gap before the ksp solver starts, seems like the PCSetup process, not sure, am I right?
> <3.png>
> 
> PCSetUp                2 1.0 1.5594e+01 1.0 3.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 79 78  0  0  0  79 78  0  0  0   196    8433     64 1.91e+03   54 1.21e+03 90
> 
> 
> Regards,
> Qi
> 
> On Tue, Mar 22, 2022 at 10:44 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   It is using 
> 
> MatSOR               369 1.0 9.1214e+00 1.0 7.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 27  0  0  0  29 27  0  0  0   803       0      0 0.00e+00  565 1.35e+03  0
> 
> which runs on the CPU not the GPU hence the large amount of time in memory copies and poor performance. We are switching the default to be Chebyshev/Jacobi which runs completely on the GPU (may already be switched in the main branch). 
> 
> You can run with -mg_levels_pc_type jacobi You should then see almost the entire solver running on the GPU.
> 
> You may need to tune the number of smoothing steps or other parameters of GAMG to get the faster solution time.
> 
>   Barry
> 
> 
>> On Mar 22, 2022, at 10:30 AM, Qi Yang <qiyang at oakland.edu <mailto:qiyang at oakland.edu>> wrote:
>> 
>> To whom it may concern,
>> 
>> I have tried petsc ex50(Possion) with cuda, ksp cg solver and gamg precondition, however, it run for about 30s. I also tried NVIDIA AMGX with the same solver and same grid (3000*3000), it only took 2s. I used nsight system software to analyze those two cases, found petsc took much time in the memory process (63% of total time, however, amgx only took 19%). Attached are screenshots of them.
>> 
>> The petsc command is : mpiexec -n 1 ./ex50  -da_grid_x 3000 -da_grid_y 3000 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -vec_type cuda -mat_type aijcusparse -ksp_monitor -ksp_view -log-view
>> 
>> The log file is also attached.
>> 
>> Regards,
>> Qi
>> 
>> <1.png>
>> <2.png>
>> <log.PETSc_cg_amg_ex50_gpu_cuda>
> 
> <log.PETSc_cg_amg_jacobi_ex50_gpu_cuda>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220322/480d5a63/attachment.html>


More information about the petsc-dev mailing list