[petsc-dev] http://www.hpcwire.com/hpcwire/2012-11-12/intel_brings_manycore_x86_to_market_with_knights_corner.html

Mon Nov 12 22:16:10 CST 2012

Hi John,

> (...)
>     I fully second Jed. Computational scientists are already fighting
>     with getting scalable performance on a 'standard' multi-core
>     architecture, so I doubt that one can really obtain a gain on an
>     accelerator-architecture for any real-world application just be
>     recompilation of  existing code. Also, add the extra issue of
>     PCI-Express latency.
>
>
> Two key points here:
>
> 1) the application will have to be threaded to get good performance on
> the Xeon Phi.  I know that PETSc is moving in this direction. My thought
> was that you would have 1 MPI process on the card and 1 on each CPU and
> use threads.

I'd be happy if it were that simple, but I doubt this. Even Intel is 
saying that the Xeon Phi is an accelerator architecture rather than a 
multi-core architecture.

> 2)  The recompilation is needed to run in "Native mode".  This is not an
> offloaded computation in the GPU sense.  The entire program runs on the
> card.  All the memory is local.  You run one binary on the card, a
> different binary on the CPU.  The only thing that has to cross the bus
> is MPI communication, which should be faster than even the fastest
> network cards because it only has to cross the bus.

Hmm, that could indeed get past the latency issue to a large extent. 
Probably some OS-functionality is not available on the Xeon Phi, thus 
some redesigning would still be required. Let's see...

Best regards,
Karli