[petsc-dev] MIC available for testing
Paul Mullowney
paulm at txcorp.com
Fri Dec 14 22:09:06 CST 2012
As a point of comparison. I've been running a PETSc CG algorithm on an
Nvidia K20. The simulation has 1.4e7 elements.
The PETSc AXPY takes .001 seconds in single precision. That's 26 GFlops.
In another simulation using a double complex BiCG algorithm with 1.e6
unknowns, the Petsc MatMult on the K20 runs at 55 GFlops!
-Paul
> Hi guys,
>
> today I got a gentle introduction into our testing machine equipped
> with two Intel MICs. They are still beta, yet I could run some simple
> kernels in native mode. As an example, without any modification of
> existing OpenMP code for vector addition in double precision of 3e6
> elements, I got the following timings:
>
> -- Native mode, i.e. all code executed on MIC --
> Single core time: 0.642 sec
> All-core time: 0.011 sec
>
> For offloaded execution (CPU <-> MIC, just like with GPUs), additional
> pragmas are required, I haven't tried that yet.
>
> For comparison, the same code on the CPU (Sandy Bridge, 8x2 cores, 2.6
> GHz) takes 0.060 sec without OpenMP and 0.030 sec with OpenMP. Thus,
> the conclusion is that one *really* needs to get all cores on the MIC
> busy in order to get the full memory bandwidth. Thus, a plain 'just
> recompile for MIC and you get good performance' won't work for most
> applications in practice, simply because the serial performance is so
> limited.
>
> @Shri: It would be interesting to give pthreads a try, particularly
> how it compares with OpenMP. I'll be out of the lab until the
> beginning of January, but I can help you with getting an account and
> getting started.
>
> Btw: I just got a call regarding Altera hardware, we might have
> chances to get our fingers on their OpenCL-enabled hardware.
>
> Best regards,
> Karli
>
More information about the petsc-dev
mailing list