[petsc-dev] A closer look at the Xeon Phi

Tue Feb 12 18:54:22 CST 2013

Hi Matt,

> Karl, I am assuming that the places in the article where the Phi beats
> the K20 are for denser matrices
> where they have explicitly vectorized?

a quick check with the matrices in the paper showed that it is indeed 
the matrices with a higher number of nonzeros per row for which the Xeon 
Phi offers higher performance than the K20 (correlation, not causality). 
There's still a bunch of impact from reordering dofs (and I think one 
can also modify reordering algorithms to better suit accelerators/GPUs), 
but overall I support your observation.

The CSR format used in the paper is not necessarily optimal for MIC and 
GPUs, but that's a different story...

Best regards,
Karli