[petsc-dev] GPU preconditioners

Fri Jan 17 15:33:39 CST 2014

Hi Andrea,

 > Well, I have 9 equations, so 9x9 I guess...

Ok, this is just in the range where it is technically meaningful 
(register sizes) but getting challenging implementation-wise (explicit 
inversion formulas vs. Gauss with pivoting)

> I hope the one you are mentioning was a major bug, because what I get is
> seriously wrong: while on single GPU (KSPGMRES+PCASM) I get a residual
> of +0.72, on 8-cores/GPU I get -1.00 at the first time step, just to
> make an example. Can this be due to the bug you are saying or you can
> suspect something more?

Yes, this was a major bug, breaking the matrix-vector product when using 
multiple MPI ranks with GPUs.

> What should I do then? wait for the valgrind fix which is underway and
> then update? Can you please notify me when this is fixed? I'm writing a
> final report for a project and I would like to include this feature
> fully fixed if possible.

I will merge the fix to master tomorrow when I'm back on my main GPU 
machine (there do not seem to be any problems in 'next' with the patch) 
and fix the valgrind complaints separately. The second issue is not 
directly related to the first, it only happens in the same module.

> Another question, what do you exactly mean by "order the unknowns
> properly" in this case?

If you build the elimination graph for the triangular factors of ILU 
preconditioners, then the ordering of the unknowns (i.e. the way you 
assign the degrees of freedoms (DOFs) on your mesh) can have a 
considerable influence on the amount of parallelism. The Cuthill-McKee 
algorithm for example is quite good for reducing the bandwidth of a 
sparse matrix, but it may also reduce the amount of parallelism for ILU0 
factors compared to e.g. a red-black ordering of the DOFs. I can send 
you a preprint if you're interested.

Best regards,
Karli