[petsc-dev] GPU preconditioners
Karl Rupp
rupp at mcs.anl.gov
Fri Jan 17 15:02:04 CST 2014
Hi Andrea,
> In fact, I have another major problem: when running on multi-GPU with
> PETSc my results are totally inconsistent compared to a single GPU .
This was a bug which was fixed a couple of days ago. It is in branch
'next', but not yet merged to master since it has another valgrind issue
I haven't nailed down yet.
> In my code, for now, I'm assuming a 1-1 correspondence between CPU and
> GPU: I run on 8 cores and 8 GPUs (4 K10). How can I enforce this in the
> PETSc solver? Is it automatically done or do I have to specify some options?
One MPI rank maps to one logical GPU. In your case, please run with 8
MPI ranks and distribute them equally over the nodes equipped with the GPUs.
As for the preconditioners: We haven't added any new preconditioners
recently. Preconditioning on GPUs is a very problem-specific thing due
to the burden of PCI-Express latency. Massively parallel approaches such
as Sparse Approximate Inverses perform well in terms of theoretical FLOP
counts, but are poor in terms of convergence and pretty expensive in
terms of memory when running many simultaneous factorizations. ILU on
the GPU can be fast if you order the unknowns properly and have only few
nonzeros per row, but it is not great in terms of convergence rate
either. PCI-Express bandwidth and latency is really a problem here...
How large are your blocks when using a block-Jacobi preconditioner for
your problem? In the order of 3x3 or (much) larger?
Best regards,
Karli
More information about the petsc-dev
mailing list