[petsc-dev] PETSc amg solver with gpu seems run slowly
Barry Smith
bsmith at petsc.dev
Tue Mar 22 11:29:51 CDT 2022
Indeed PCSetUp is taking most of the time (79%). In the version of PETSc you are running it is doing a great deal of the setup work on the CPU. You can see there is a lot of data movement between the CPU and GPU (in both directions) during the setup; 64 1.91e+03 54 1.21e+03 90
Clearly, we need help in porting all the parts of the GAMG setup that still occur on the CPU to the GPU.
Barry
> On Mar 22, 2022, at 12:07 PM, Qi Yang <qiyang at oakland.edu> wrote:
>
> Dear Barry,
>
> Your advice is helpful, now the total time reduce from 30s to 20s(now all matrix run on gpu), actually I have tried other settings for amg predicontioner, seems not help that a lot, like -pc_gamg_threshold 0.05 -pc_gamg_threshold_scale 0.5.
> it seems the key point is the PCSetup process, from the log, it takes the most time, and we can find from the new nsight system analysis, there is a big gap before the ksp solver starts, seems like the PCSetup process, not sure, am I right?
> <3.png>
>
> PCSetUp 2 1.0 1.5594e+01 1.0 3.06e+09 1.0 0.0e+00 0.0e+00 0.0e+00 79 78 0 0 0 79 78 0 0 0 196 8433 64 1.91e+03 54 1.21e+03 90
>
>
> Regards,
> Qi
>
> On Tue, Mar 22, 2022 at 10:44 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>
> It is using
>
> MatSOR 369 1.0 9.1214e+00 1.0 7.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 29 27 0 0 0 29 27 0 0 0 803 0 0 0.00e+00 565 1.35e+03 0
>
> which runs on the CPU not the GPU hence the large amount of time in memory copies and poor performance. We are switching the default to be Chebyshev/Jacobi which runs completely on the GPU (may already be switched in the main branch).
>
> You can run with -mg_levels_pc_type jacobi You should then see almost the entire solver running on the GPU.
>
> You may need to tune the number of smoothing steps or other parameters of GAMG to get the faster solution time.
>
> Barry
>
>
>> On Mar 22, 2022, at 10:30 AM, Qi Yang <qiyang at oakland.edu <mailto:qiyang at oakland.edu>> wrote:
>>
>> To whom it may concern,
>>
>> I have tried petsc ex50(Possion) with cuda, ksp cg solver and gamg precondition, however, it run for about 30s. I also tried NVIDIA AMGX with the same solver and same grid (3000*3000), it only took 2s. I used nsight system software to analyze those two cases, found petsc took much time in the memory process (63% of total time, however, amgx only took 19%). Attached are screenshots of them.
>>
>> The petsc command is : mpiexec -n 1 ./ex50 -da_grid_x 3000 -da_grid_y 3000 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 -vec_type cuda -mat_type aijcusparse -ksp_monitor -ksp_view -log-view
>>
>> The log file is also attached.
>>
>> Regards,
>> Qi
>>
>> <1.png>
>> <2.png>
>> <log.PETSc_cg_amg_ex50_gpu_cuda>
>
> <log.PETSc_cg_amg_jacobi_ex50_gpu_cuda>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220322/480d5a63/attachment.html>
More information about the petsc-dev
mailing list