[petsc-users] GPU speedup in Poisson solvers
Karl Rupp
rupp at iue.tuwien.ac.at
Mon Sep 22 14:25:15 CDT 2014
Hi,
> I am new to PETSc and trying to determine if GPU speedup is possible
with the 3D Poisson solvers. I configured 2 copies of 'petsc-master' on
a standalone machine, one with CUDA toolkit 5.0 and one without (both
without MPI):
> Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
> CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
> GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, Driver: 313.09)
>
> I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting about 20% speedup with GPU. Is this reasonable or did I miss something?
That is fairly reasonable for your setting, yet the setup is not ideal:
With the default ILU preconditioner, the residual gets copied between
host and device in each iteration. Better use a preconditioner suitable
for the GPU. For a Poisson problem you should get good numbers with the
algebraic multigrid preconditioner in CUSP (-pctype sacusp)
For Poisson you may also try CG instead of GMRES to save all the
orthogonalization costs - assuming that you use a symmetric preconditioner.
> Attached is a comparison chart with two sample logs. The y-axis is the elapsed time in seconds and the x-axis corresponds to the size of the problem. In particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' and 'VecCUSPCopyFrom' shown in the GPU log are excessive?
They just manifest that the residual gets copied between host and device
in each iteration because ILU is only run sequentially.
Best regards,
Karli
More information about the petsc-users
mailing list