[petsc-dev] http://www.hpcwire.com/hpcwire/2012-11-12/intel_brings_manycore_x86_to_market_with_knights_corner.html
Anton Popov
popov at uni-mainz.de
Tue Nov 13 16:30:27 CST 2012
It's really a mystery that you get any speedup on any kind of processor
for MKL sparse triangular solves, because according to this:
http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library
they are not threaded at all. citation: "For sparse matrices, all level
2 operations except for the sparse triangular solvers are threaded"
Anton
On 11/13/12 7:11 PM, Paul Mullowney wrote:
> Sparse solves. MKL has an option for using multiple CPU cores for
> their sparse triangular solve with:
>
> mkl_set_num_threads()
>
> Under the hood, the MKL implementation uses the level-scheduler
> algorithm for extracting some amount of parallelism. We've tested this
> on many matrices and never seen scalability on a sandy bridge. I don't
> know the reason for this. For some matrices, the level-scheduler
> algorithm has a modest amount of parallelism and I would expect some
> benefit going to multiple cores.
>
> -Paul
>
>
>> On 11/13/12 2:54 AM, Paul Mullowney wrote:
>>> Every test we've done shows that the MKL triangular solve doesn't
>>> scale at all on a sandy bridge multi-core. I doubt it will be any
>>> different on the Xeon Phi.
>>>
>>> -Paul
>> Do you mean sparse or dense solves? Sparse triangular solves are
>> sequential in MKL. PARDISO also does it sequentially.
>>
>> Anton
>>
>>>>>
>>>>>>
>>>>>> In terms of raw numbers, $2,649 for 320 GB/sec and 8 GB of memory
>>>>>> is quite a lot compared to the $500 of a Radeon HD 7970 GHz
>>>>>> Edition at 288 GB/sec and 3 GB memory. My hope is that Xeon Phi
>>>>>> can do better than GPUs in kernels requiring frequent global
>>>>>> synchronizations, e.g. ILU-substitutions.
>>>>>
>>>>> But, but, but it runs the Intel instruction set, that is
>>>>> clearly worth 5+ times the price :-)
>>>>
>>>> I'm tempted to say 'yes', but at a second thought I'm not so sure
>>>> whether any of us is actually programming in x86 assembly (again)?
>>>> Part of the GPU/accelerator hype is arguably due to a rediscovery
>>>> of programming close to hardware, even though it was/is non-x86.
>>>> With Xeon Phi we might now observe some sort of compiler war
>>>> instead of low-level kernel tuning - is this what we want?
>>>>
>>>> Best regards,
>>>> Karli
>>>>
>>>
>>>
>
More information about the petsc-dev
mailing list