[petsc-dev] PETSc GPU capabilities

John Fettig john.fettig at gmail.com
Mon Feb 27 15:48:26 CST 2012


Hi Paul,

This is very interesting.  I tried building the code with
--download-txpetscgpu and it doesn't work for me.  It runs out of memory,
no matter how small the problem (this is ex2 from
src/ksp/ksp/examples/tutorials):

mpirun -np 1 ./ex2 -n 10 -m 10 -ksp_type cg -pc_type sacusp -mat_type
aijcusp -vec_type cusp -cusp_storage_format csr -use_cusparse 0

terminate called after throwing an instance of
'thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: out of memory
MPI Application rank 0 killed before MPI_Finalize() with signal 6

This example works fine when I build without your gpu additions (and for
much larger problems too).  Am I doing something wrong?

For reference, I'm using CUDA 4.1, CUSP 0.3, and Thrust 1.5.1

John

On Fri, Feb 10, 2012 at 5:04 PM, Paul Mullowney <paulm at txcorp.com> wrote:

> Hi All,
>
> I've been developing GPU capabilities for PETSc. The development has
> focused mostly on
> (1) An efficient multi-GPU SpMV, i.e. MatMult. This is working well.
> (2) Triangular Solve used in ILU preconditioners; i.e. MatSolve. The
> performance of this ... is what it is :|
> This code is in beta mode. Keep that in mind, if you decide to use it. It
> supports single and double precision, real numbers only! Complex will be
> supported at some point in the future, but not any time soon.
>
> To build with these capabilities, add the following to your configure line.
> --download-txpetscgpu=yes
>
> The capabilities of the SpMV code are accessed with the following 2
> command line flags
> -cusp_storage_format csr (other options are coo (coordinate), ell
> (ellpack), dia (diagonal). hyb (hybrid) is not yet supported)
> -use_cusparse (this is a boolean and at the moment is only supported with
> csr format matrices. In the future, cusparse will work with ell, coo, and
> hyb formats).
>
> Regarding the number of GPUs to run on:
> Imagine a system with P nodes, N cores per node, and M GPUs per node.
> Then, to use only the GPUs, I would run with M ranks per node over P nodes.
>  As an example, I have a system with 2 nodes. Each node has 8 cores, and 4
> GPUs attached to each node (P=2, N=8, M=4). In a PBS queue script, one
> would use 2 nodes at 4 processors per node. Each mpi rank (CPU processor)
> will be attached to a GPU.
>
> You do not need to explicitly manage the GPUs, apart from understanding
> what type of system you are running on. To learn how many devices are
> available per node, use the command line flag:
> -cuda_show_devices
>
> -Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/32e56dd4/attachment.html>


More information about the petsc-dev mailing list