[petsc-dev] http://www.hpcwire.com/hpcwire/2012-11-12/intel_brings_manycore_x86_to_market_with_knights_corner.html

Mon Nov 12 22:00:14 CST 2012

On Mon, Nov 12, 2012 at 10:46 PM, Karl Rupp <rupp at mcs.anl.gov> wrote:

> Hey,
>
>
>
>          Then add on top of this the fact that you could simply recompile
>>         PETSc, run it natively on the card, and still run it on your
>>         CPU's as MPMD.
>>
>>
>>     This is a good way to get terrible performance.
>>
>>
>> Why?  Decompose your domain to take into account the imbalance in
>> computational power.  The link between the card and the CPU is going to
>> be faster than going to another node.
>>
>
> I fully second Jed. Computational scientists are already fighting with
> getting scalable performance on a 'standard' multi-core architecture, so I
> doubt that one can really obtain a gain on an accelerator-architecture for
> any real-world application just be recompilation of  existing code. Also,
> add the extra issue of PCI-Express latency.
>

Two key points here:

1) the application will have to be threaded to get good performance on the
Xeon Phi.  I know that PETSc is moving in this direction. My thought was
that you would have 1 MPI process on the card and 1 on each CPU and use
threads.

2)  The recompilation is needed to run in "Native mode".  This is not an
offloaded computation in the GPU sense.  The entire program runs on the
card.  All the memory is local.  You run one binary on the card, a
different binary on the CPU.  The only thing that has to cross the bus is
MPI communication, which should be faster than even the fastest network
cards because it only has to cross the bus.

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121112/794a3fd7/attachment.html>