[petsc-dev] GPU preconditioners

Fri Jan 17 15:02:04 CST 2014

Hi Andrea,

> In fact, I have another major problem: when running on multi-GPU with
> PETSc my results are totally inconsistent compared to a single GPU  .

This was a bug which was fixed a couple of days ago. It is in branch 
'next', but not yet merged to master since it has another valgrind issue 
I haven't nailed down yet.

> In my code, for now, I'm assuming a 1-1 correspondence between CPU and
> GPU: I run on 8 cores and 8 GPUs (4 K10).  How can I enforce this in the
> PETSc solver? Is it automatically done or do I have to specify some options?

One MPI rank maps to one logical GPU. In your case, please run with 8 
MPI ranks and distribute them equally over the nodes equipped with the GPUs.

As for the preconditioners: We haven't added any new preconditioners 
recently. Preconditioning on GPUs is a very problem-specific thing due 
to the burden of PCI-Express latency. Massively parallel approaches such 
as Sparse Approximate Inverses perform well in terms of theoretical FLOP 
counts, but are poor in terms of convergence and pretty expensive in 
terms of memory when running many simultaneous factorizations. ILU on 
the GPU can be fast if you order the unknowns properly and have only few 
nonzeros per row, but it is not great in terms of convergence rate 
either. PCI-Express bandwidth and latency is really a problem here...

How large are your blocks when using a block-Jacobi preconditioner for 
your problem? In the order of 3x3 or (much) larger?

Best regards,
Karli