[petsc-dev] Improving and stabilizing GPU support
Karl Rupp
rupp at mcs.anl.gov
Fri Jul 19 16:54:40 CDT 2013
Hi Dave,
> Your list sounds great to me. Glad that you and Paul are working on
this together.
>
> My main interests are in better preconditioner support and better multi-GPU/MPI
> scalability.
This is follow-up work then. There are a couple of 'simple'
preconditioners (polynomial preconditioning, maybe some point-block
Jacobi) which can also be useful as smoothers and which we can add in
the near future. We should just get the 'infrastructure' work done first
so that we don't have to unnecessarily adjust too much code later on.
> Is there any progress on Steve Dalton's work on the cusp algebraic multigrid
> preconditioner with PETSc? I believe Jed said in a previous email that Steve
> was going to be working on adding MPI support for that as well as other
> enhancements.
Yes, Steve is working on this right here at our division. Jed can give a
more detailed answer on this.
> Will there be any improvements for GPU preconditioners in ViennaCL 1.5.0?
> When do you expect ViennaCL 1.5.0 to be available in PETSc?
Jed gave me a good hint with respect to D-ILU0, which I'll also add to
PETSc. As with other GPU-accelerations using ILU, it will require a
proper matrix ordering to give good performance. I'm somewhat tempted to
port the SA-AMG implementation in CUSP to OpenCL as well, but this
certainly won't be in 1.5.0.
> I'm also interested in trying the PETSc ViennaCL support on the Xeon Phi.
> Do you have a schedule for when that might be ready for friendly testing?
With OpenCL you can already test this now. Just install the Intel OpenCL
SDK on your Xeon Phi machine, configure with --download-viennacl,
--with-opencl-include=..., --with-opencl-lib=..., and pass the
-viennacl_device_accelerator
flag in addition to -vec_type viennacl -mat_type aijviennacl when executing.
Unfortunately the application memory bandwidth we get on the Xeon Phi is
too limited to be useful for off-loaded execution as it is the case with
OpenCL: Even the folks at Intel couldn't obtain more than ~95 GB/sec
even when filling up the whole MIC with just two vectors for
benchmarking a simple copy operation. Thus, I don't think our efforts
are currently well spent on trying a fully native execution of PETSc on
the MIC, because the trend is going more towards a tighter
CPU/accelerator integration on the same die rather than piggy-backing
via PCI-Express. Anyway, I'll let you know if there are any updates on
this front.
Best regards,
Karli
More information about the petsc-dev
mailing list