[petsc-users] GPU speedup in Poisson solvers

Karl Rupp rupp at iue.tuwien.ac.at
Mon Sep 22 14:25:15 CDT 2014


Hi,

 > I am new to PETSc and trying to determine if GPU speedup is possible 
with the 3D Poisson solvers. I configured 2 copies of 'petsc-master' on 
a standalone machine, one with CUDA toolkit 5.0 and one without (both 
without MPI):
> Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
> CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
> GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, Driver: 313.09)
>
> I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting about 20% speedup with GPU. Is this reasonable or did I miss something?

That is fairly reasonable for your setting, yet the setup is not ideal: 
With the default ILU preconditioner, the residual gets copied between 
host and device in each iteration. Better use a preconditioner suitable 
for the GPU. For a Poisson problem you should get good numbers with the 
algebraic multigrid preconditioner in CUSP (-pctype sacusp)

For Poisson you may also try CG instead of GMRES to save all the 
orthogonalization costs - assuming that you use a symmetric preconditioner.

> Attached is a comparison chart with two sample logs. The y-axis is the elapsed time in seconds and the x-axis corresponds to the size of the problem. In particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' and 'VecCUSPCopyFrom' shown in the GPU log are excessive?

They just manifest that the residual gets copied between host and device 
in each iteration because ILU is only run sequentially.

Best regards,
Karli



More information about the petsc-users mailing list