[petsc-users] GPU local direct solve of penta-diagonal

Thu Dec 12 15:59:41 CST 2013

Hi,

 >     Yes, each MPI process is responsible for solving a system of
>     nonlinear equations on a number of grid cells.
>
>
> Just to elaborate, and Ed can correct me, each MPI process has a few 100
> to a few 1000 (spacial) cells.  We solve a (Folker-Plank) system in
> velocity space at each grid cell.

Thanks, Mark, this helps. Is there any chance you can collect a couple 
of spatial cells together and solve a bigger system consisting of 
decoupled subsystems?

Ideally you have more than 100k dofs for GPUs to perform well. Have a 
look at this figure here (cross-over at about 10k dofs for CUDA):
   http://viennacl.sourceforge.net/uploads/pics/cg-timings.png
to get an idea about the saturation of GPU solves at smaller system 
sizes. PCI-Express latency is the limiting factor here.

Best regards,
Karli