[petsc-users] Building with MKL 10.3

Wed Mar 16 21:24:19 CDT 2011

Hello Rob,
     Thanks for the update, this might be very valuable for other developers
out there!
I am still a little surprised by the performance of MKL, I use it for a
variety of problems and haven't come across a situation where performance
has been penalized! Maybe there are more experienced developers out here who
have seen otherwise. Have you tried ATLAS by any chance?

When the MPI performance is concerned, I wouldn't be very surprised if
intelMPI turns out to be faster than MPICH2. All this is of course fabric
dependent but has been as I said in my experience for a variety of problems.
Also, I think intelMPI is a customized version of MPICH2, so may not be a
big surprise there

 There are very experienced people in this forum who might be able to say
otherwise and give more accurate answers

Cheers,

C.S.N

On Wed, Mar 16, 2011 at 4:22 PM, Robert Ellis <Robert.Ellis at geosoft.com>wrote:

>  Hi All,
>
>
>
> For those still interested in this thread, timing tests with MKL indicate
> that sequential MKL performs approximately the same as parallel MKL with
> NUM_THREADS=1, which isn't too surprising. What is a bit surprising is that
> MKL always, at least for this application, gives significantly slower
> performance than direct compilation of the code from
>  --download-f-blas-lapack=1. My conclusion is that if your code is written
> with explicit parallelization, in this case using PETSc, and fully utilizes
> your hardware,  using sophisticated libraries may actually harm performance.
> Keep it simple!
>
>
>
> Now a question: all my tests used MPICH2. Does anyone think using Intel MPI
> would significantly improve the performance of MKL with PETSc?
>
>
>
> Cheers,
>
> Rob
>
>
>
> *From:* petsc-users-bounces at mcs.anl.gov [mailto:
> petsc-users-bounces at mcs.anl.gov] *On Behalf Of *Rob Ellis
> *Sent:* Tuesday, March 15, 2011 3:33 PM
> *To:* 'PETSc users list'
>
> *Subject:* Re: [petsc-users] Building with MKL 10.3
>
>
>
> Yes, MKL_DYNAMIC was set to true. No, I haven't tested on Nehalem. I'm
> currently comparing sequential MKL with --download-f-blas-lapack=1.
>
> Rob
>
>
>
> *From:* petsc-users-bounces at mcs.anl.gov [mailto:
> petsc-users-bounces at mcs.anl.gov] *On Behalf Of *Natarajan CS
>
> *Sent:* Tuesday, March 15, 2011 3:20 PM
> *To:* PETSc users list
> *Cc:* Robert Ellis
> *Subject:* Re: [petsc-users] Building with MKL 10.3
>
>
>
> Thanks Eric and Rob.
>
>
> Indeed! Was MKL_DYNAMIC set to default (true)? It looks like using 1 thread
> per core (sequential MKL) is the right thing to do as baseline.
>  I would think that the performance of #cores =  num_mpi_processes *
> num_mkl_threads might be <= #cores = num_mpi_processes case (# cores const)
> unless some cache effects come into play (Not sure what, I would think the
> mkl installation should weed these issues out).
>
>
>
> P.S :
> Out of curiosity have you also tested your app on Nehalem? Any difference
> between Nehalem vs Westmere for similar bandwidth?
>
> On Tue, Mar 15, 2011 at 4:35 PM, Jed Brown <jed at 59a2.org> wrote:
>
> On Tue, Mar 15, 2011 at 22:30, Robert Ellis <Robert.Ellis at geosoft.com>
> wrote:
>
> Regardless of setting the number of threads for MKL or OMP, the MKL
> performance was worse than simply using --download-f-blas-lapack=1.
>
>
>
> Interesting. Does this statement include using just one thread, perhaps
> with a non-threaded MKL? Also, when you used threading, were you putting an
> MPI process on every core or were you making sure that you had enough cores
> for num_mpi_processes * num_mkl_threads?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110316/7ff349ab/attachment.htm>