[petsc-users] GPU local direct solve of penta-diagonal

Karl Rupp rupp at mcs.anl.gov
Thu Dec 12 15:50:30 CST 2013


Hi Ed,


 > Yes, each MPI process is responsible for solving a system of nonlinear
> equations on a number of grid cells.
> The nonlinear equations are solved by Picard iteration and the time
> consuming part is the formation and solution of the nonsymmetric sparse
> linear system arising from a rectangular grid with a regular finite
> difference stencil.  All the linear systems have the same sparsity
> pattern but may have different numerical values.
>
> Since there are 16 cores on each node on Titan, there can be
> concurrently  16 separate independent linear systems to be solved.
> One may not want to batch or synchronize the solvers since different
> grid cells may require different number of Picard iterations.

Hmm, this does not sound like something I would consider a good fit for 
GPUs. With 16 MPI processes you have additional congestion of the one or 
two GPUs per node, so you would have the rethink the solution procedure 
as a whole. I can think of a procedure where each of these systems is 
solved on a separate streaming processor (or work group in OpenCL 
language), where synchronization is cheaper - however, this is not 
covered by standard functionality in PETSc. Either way, you would 
certainly trade robustness of the implementation and a substantial 
amount of development time for probably a 2x speedup (if you're lucky).

If you want to give it a try nonetheless, try
  -vectype cusp -mattype aijcusp
and some simple preconditioners like Jacobi in order to avoid 
host<->device communication.

Best regards,
Karli


>
> Ed
>
>
> On 12/12/2013 04:15 PM, Karl Rupp wrote:
>> Hi Mark,
>>
>>   > We have a lot of 5-point stencil operators on ~50x100 grids to solve.
>>>    These are not symmetric and we have been using LU.  We want to move
>>> this onto GPUs (Titan).  What resources are there to do this?
>> do you have lots of problems to solve simultaneously? Or any other
>> feature that makes this problem expensive? 50x100 would mean a system
>> size of about 5000 dofs, which is too small to really benefit from GPUs.
>>
>> Best regards,
>> Karli
>>
>



More information about the petsc-users mailing list