[petsc-users] GPU local direct solve of penta-diagonal
Karl Rupp
rupp at mcs.anl.gov
Thu Dec 12 15:50:30 CST 2013
Hi Ed,
> Yes, each MPI process is responsible for solving a system of nonlinear
> equations on a number of grid cells.
> The nonlinear equations are solved by Picard iteration and the time
> consuming part is the formation and solution of the nonsymmetric sparse
> linear system arising from a rectangular grid with a regular finite
> difference stencil. All the linear systems have the same sparsity
> pattern but may have different numerical values.
>
> Since there are 16 cores on each node on Titan, there can be
> concurrently 16 separate independent linear systems to be solved.
> One may not want to batch or synchronize the solvers since different
> grid cells may require different number of Picard iterations.
Hmm, this does not sound like something I would consider a good fit for
GPUs. With 16 MPI processes you have additional congestion of the one or
two GPUs per node, so you would have the rethink the solution procedure
as a whole. I can think of a procedure where each of these systems is
solved on a separate streaming processor (or work group in OpenCL
language), where synchronization is cheaper - however, this is not
covered by standard functionality in PETSc. Either way, you would
certainly trade robustness of the implementation and a substantial
amount of development time for probably a 2x speedup (if you're lucky).
If you want to give it a try nonetheless, try
-vectype cusp -mattype aijcusp
and some simple preconditioners like Jacobi in order to avoid
host<->device communication.
Best regards,
Karli
>
> Ed
>
>
> On 12/12/2013 04:15 PM, Karl Rupp wrote:
>> Hi Mark,
>>
>> > We have a lot of 5-point stencil operators on ~50x100 grids to solve.
>>> These are not symmetric and we have been using LU. We want to move
>>> this onto GPUs (Titan). What resources are there to do this?
>> do you have lots of problems to solve simultaneously? Or any other
>> feature that makes this problem expensive? 50x100 would mean a system
>> size of about 5000 dofs, which is too small to really benefit from GPUs.
>>
>> Best regards,
>> Karli
>>
>
More information about the petsc-users
mailing list