[petsc-dev] A closer look at the Xeon Phi

Karl Rupp rupp at mcs.anl.gov
Tue Feb 12 19:09:26 CST 2013


Hi,

On 02/12/2013 06:40 PM, Jed Brown wrote:
> On Tue, Feb 12, 2013 at 6:06 PM, Tim Tautges <tautges at mcs.anl.gov
> <mailto:tautges at mcs.anl.gov>> wrote:
>
>     I'm kind of surprised at the > 10k element crossover myself.  For
>     the strong scaling cases, at high core counts, that's not terribly
>     far from the number of DOFS per processor, is it?  I guess CPUs will
>     be slower than the Xeon in most cases (BGx), or fewer (Titan), but
>     still.
>
>
> Which crossover are you referring to? The CPU versus GTX285 at about 20k
> dofs, but with only very small gains for another order of magnitude?

I assume it's the cross-over of Xeon Phi vs. Xeon, but almost all 
cross-overs happen in the 10k-100k region and are due to PCI-Express 
latency. Scaling this to large clusters, replace Xeon Phi by compute 
node, PCI-Express by GB-Ethernet/Infiniband as well as change the 
timescale a bit and you're probably not too far off...


> For 2D Laplace, we expect to see strong scaling peter out around a
> couple thousand dofs per core. It can go a little further on Blue Gene
> because the network is much faster and the cores are a bit slower.
>
> Titan has a lot of (premium price) GPUs that you have to use to utilize
> the machine well, but it's unclear whether the architecture is
> delivering a science/dollars advantage, even if you ignore development
> costs to port and re-tune codes and the environment costs (it's more
> complicated to build and run, so it takes people longer to get running).
> I think the main justification is speculation about what future hardware
> will look like, not cost-effectiveness today.

Once should also keep the influence of TOP500 in mind here. I don't 
think this machine was primarily built for domain decomposition and the 
like rather than "approaching Exascale via Linpack".

Best regards,
Karli




More information about the petsc-dev mailing list