[petsc-users] Building with MKL 10.3

Tue Mar 15 17:32:44 CDT 2011

Yes, MKL_DYNAMIC was set to true. No, I haven't tested on Nehalem. I'm
currently comparing sequential MKL with --download-f-blas-lapack=1.

Rob

From: petsc-users-bounces at mcs.anl.gov
[mailto:petsc-users-bounces at mcs.anl.gov] On Behalf Of Natarajan CS
Sent: Tuesday, March 15, 2011 3:20 PM
To: PETSc users list
Cc: Robert Ellis
Subject: Re: [petsc-users] Building with MKL 10.3

Thanks Eric and Rob.

Indeed! Was MKL_DYNAMIC set to default (true)? It looks like using 1 thread
per core (sequential MKL) is the right thing to do as baseline.
 I would think that the performance of #cores =  num_mpi_processes *
num_mkl_threads might be <= #cores = num_mpi_processes case (# cores const)
unless some cache effects come into play (Not sure what, I would think the
mkl installation should weed these issues out).

P.S :
Out of curiosity have you also tested your app on Nehalem? Any difference
between Nehalem vs Westmere for similar bandwidth?

On Tue, Mar 15, 2011 at 4:35 PM, Jed Brown <jed at 59a2.org> wrote:

On Tue, Mar 15, 2011 at 22:30, Robert Ellis <Robert.Ellis at geosoft.com>
wrote:

Regardless of setting the number of threads for MKL or OMP, the MKL
performance was worse than simply using --download-f-blas-lapack=1.

Interesting. Does this statement include using just one thread, perhaps with
a non-threaded MKL? Also, when you used threading, were you putting an MPI
process on every core or were you making sure that you had enough cores for
num_mpi_processes * num_mkl_threads?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110315/d848575a/attachment.htm>