[petsc-users] PETSc doesn't allow use of multithreaded MKL with MUMPS + fblaslapack?

Satish Balay balay at mcs.anl.gov
Sun Aug 12 15:27:45 CDT 2018


On Sun, 12 Aug 2018, Appel, Thibaut wrote:

> Hi Satish, 
> 
> > Le 12 août 2018 à 20:20, Satish Balay <balay at mcs.anl.gov> a écrit :
> > 
> > Hm - its just a default - so you can always change the default value
> > to a more suitable one for your usage. [i.e use --with-blaslapack-lib
> > option instead of --with-blaslapack-dir option]
> > 
> 
> Ok thanks I’m going to try that.
> 
> > For regular petsc build - we think that sequential MKL is the best match.
> > 
> > For a build with C/Pardiso - I believe its best to use threaded MKL.
> 
> That makes sense. I think the threaded MKL defaults to the sequential MKL with 1 thread anyway

I don't think this is true. google brings up the following:

https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-intel-mkl-100-threading

"when the environment variable OMP_NUM_THREADS is undefined, Intel MKL may create multiple threads depending on problem size and the value of the MKL_DYNAMIC or other threading environment variables."

OpenBlas does something similar. We have to explicitly set 'OPENBLAS_NUM_THREADS=1' variable to force it run single-threaded.

Satish


> 
> > 
> > Wrt with-openmp=1 - if threaded MKL is preferable - I guess we could
> > change the default. But a default does not prevent one from using a
> > preferred blas [whatever that is]
> > 
> > Wrt fblaslapack - yes its not multi-threaded. But I believe openblas is
> > multi-threaded [so you could use --download-openblas as alternative]
> > 
> 
> > The usual issue is - one cannot use threaded MKL [or any threaded
> > library] as a black box. They would have to always be aware of how
> > many mpi procs, and openmp threads are being used - and tweak these
> > parameters constantly. The default for OpenMPI is to use the whole
> > machine - i.e it expects 1 mpi task per node. If one users more mpi
> > tasks per node - and does not reduce threads per node - they get bad
> > performance. Hence we avoid using threaded MKL as a default..
> > 
> 
> I’m not sure to fully understand, w hy not?
> If I’m on a batch system, I specify a number of MPI tasks and a number of threads / MPI task that remain constant during the computation?
> If I’m not on a batch system, I just run mpirun with # of task and set OMP_NUM_THREADS that is the number of threads per MPI task and that’s it?
> (I use the Intel MPI library on one cluster and MPICH on the other)
> 
> 
> Thibaut
> 
> > Satish
> > 
> > On Sun, 12 Aug 2018, Appel, Thibaut wrote:
> > 
> >> Good afternoon,
> >> 
> >> I have an application code written in pure MPI but wanted to exploit multithreading in MUMPS (contained in calls to BLAS routines)
> >> On a high-end parallel cluster I’m using, I’m linking with the Intel MKL library but it seems that PETSc won’t configure the way I want:
> >> 
> >> ./configure […] —with-openmp=1 --with-pic=1 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-blaslapack-dir=${MKLROOT} --with-scalapack-lib="-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-scalapack-include=${MKLROOT}/include --download-metis --download-parmetis --download-mumps
> >> 
> >> yields BLAS/LAPACK: -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> >> 
> >> while if I configure with cpardiso on top of the same flags
> >> 
> >> ./configure […] —with-openmp=1 —with-pic=1 --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-blaslapack-dir=${MKLROOT} --with-scalapack-lib="-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-scalapack-include=${MKLROOT}/include --with-mkl_cpardiso-dir=${MKLROOT} --download-metis --download-parmetis --download-mumps
> >> 
> >> the configure script says
> >> ===============================================
> >> BLASLAPACK: Looking for Multithreaded MKL for C/Pardiso
> >> ===============================================
> >> 
> >> and yields BLAS/LAPACK: -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_lp64 -liomp5 -ldl -lpthread
> >> 
> >> In other words, there is no current possibility of activating multithreaded BLAS with MUMPS in spite of the option —with-openmp=1, as libmkl_sequential is linked. Is it not possible to fix that and use libmkl_intel_thread by default?
> >> 
> >> On another smaller cluster, I do not have MKL and configure PETSc with BLAS downloaded with —download-fblaslapack, which is not multithreaded.
> >> Could you confirm I would need to link with a multithreaded BLAS library I downloaded myself and use —with-openmp=1? Would it be `recognized` by the MUMPS installed by PETSc?
> >> 
> >> Thanks for your support,
> >> 
> >> 
> >> Thibaut
> >> 
> 
> 


More information about the petsc-users mailing list