[petsc-users] GPU speedup in Poisson solvers

Dominic Meiser dmeiser at txcorp.com
Mon Sep 22 14:38:29 CDT 2014


On 09/22/2014 12:57 PM, Chung Shen wrote:
> Dear PETSc Users,
>
> I am new to PETSc and trying to determine if GPU speedup is possible with the 3D Poisson solvers. I configured 2 copies of 'petsc-master' on a standalone machine, one with CUDA toolkit 5.0 and one without (both without MPI):
> Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
> CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
> GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, Driver: 313.09)
>
> I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting about 20% speedup with GPU. Is this reasonable or did I miss something?
>
> Attached is a comparison chart with two sample logs. The y-axis is the elapsed time in seconds and the x-axis corresponds to the size of the problem. In particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' and 'VecCUSPCopyFrom' shown in the GPU log are excessive?
>
> Thanks in advance for your reply.
>
> Best Regards,
>
> Chung Shen
A few comments:

- To get reliable timing you should configure PETSc without debugging 
(i.e. --with-debugging=no)
- The ILU preconditioning in your GPU benchmark is done on the CPU. The 
host-device data transfers are killing performance. Can you try to run 
with the additional option --pc_factor_mat_solver_packe cusparse? This 
will perform the preconditioning on the GPU.
- If you're interested in running benchmarks in parallel you will need a 
few patches that are not yet in petsc/master. I can put together a 
branch that has the needed fixes.

Cheers,
Dominic

-- 
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com



More information about the petsc-users mailing list