[petsc-users] MPI+OpenMP+MKL

Fri Apr 7 19:17:15 CDT 2023

Thanks for your reply Matt.

The problem seems to be the MKL threads I just realized.

Inside the MatShell I call:

call omp_set_nested(.true.)
call omp_set_dynamic(.false.)
call mkl_set_dynamic(0)

Then, inside the omp single thread I use:

nMkl0 = mkl_set_num_threads_local(nMkl)

where nMkl is set to 24

MKL_VERBOSE shows, that the calls to have access to 24 threads but the
timings are the same as in 1 thread

MKL_VERBOSE
ZGEMV(N,12544,12544,0x7ffde9edc800,0x14e4662d2010,12544,0x14985e610,1,0x7ffde9edc7f0,0x189faaa90,1)
117.09ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:24
MKL_VERBOSE
ZGEMV(N,12544,12544,0x7ffe00355700,0x14c8ec1e4010,12544,0x16959c830,1,0x7ffe003556f0,0x17dd7da70,1)
117.37ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:1

The configuration of OpenMP that is launching these MKL processes is as
follows:

OPENMP DISPLAY ENVIRONMENT BEGIN
  _OPENMP = '201511'
  OMP_DYNAMIC = 'FALSE'
  OMP_NESTED = 'TRUE'
  OMP_NUM_THREADS = '24'
  OMP_SCHEDULE = 'DYNAMIC'
  OMP_PROC_BIND = 'TRUE'
  OMP_PLACES = '{0:24}'
  OMP_STACKSIZE = '0'
  OMP_WAIT_POLICY = 'PASSIVE'
  OMP_THREAD_LIMIT = '4294967295'
  OMP_MAX_ACTIVE_LEVELS = '255'
  OMP_CANCELLATION = 'FALSE'
  OMP_DEFAULT_DEVICE = '0'
  OMP_MAX_TASK_PRIORITY = '0'
  OMP_DISPLAY_AFFINITY = 'FALSE'
  OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
  OMP_ALLOCATOR = 'omp_default_mem_alloc'
  OMP_TARGET_OFFLOAD = 'DEFAULT'
  GOMP_CPU_AFFINITY = ''
  GOMP_STACKSIZE = '0'
  GOMP_SPINCOUNT = '300000'
OPENMP DISPLAY ENVIRONMENT END

On Fri, Apr 7, 2023 at 1:25 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Apr 7, 2023 at 2:26 PM Astor Piaz <appiazzolla at gmail.com> wrote:
>
>> Hi Matthew, Jungchau,
>> Thank you for your advice. The code still does not work, I give more
>> details about it below, I can specify more about it as you wish.
>>
>> I am implementing a spectral method resulting in a block matrix where the
>> off-diagonal blocks are Poincare-Steklov operators of
>> impedance-to-impedance type.
>> Those Poincare-Steklov operators have been created hierarchically merging
>> subdomain operators (the HPS method), and I have a well tuned (but rather
>> complex) OpenMP+MKL code that can apply this operator very fast.
>> I would like to use PETSc's MPI-parallel GMRES solver with a MatShell
>> that calls my OpenMP+MKL code, while each block can be in a different MPI
>> process.
>>
>> At the moment the code runs correctly, except that PETSc is not letting
>> my OpenMP+MKL code make the scheduling of threads as I choose.
>>
>
> PETSc does not say anything about OpenMP threads. However, maybe you need
> to launch the executable with the correct OMP env variables?
>
>   Thanks,
>
>      Matt
>
>
>> I am using
>> ./configure --with-scalar-type=complex --prefix=../install/fast/
>> --with-debugging=0 -with-openmp=1 --with-blaslapack-dir=${MKLROOT}
>> --with-mkl_cpardiso-dir=${MKLROOT} --with-threadsafety --with-log=0
>> COPTFLAGS=-g -Ofast CXXOPTFLAGS=-g -Ofast FOPTFLAGS=-g -Ofast
>>
>> Attached is an image of htop showing that the MKL threads are indeed
>> being spawn, but they remain unused by the code. The previous calculations
>> on the code show that it is capable of using OpenMP and MKL, only when
>> PETSC KSPSolver is called MKL seems to be turned off.
>>
>> On Fri, Apr 7, 2023 at 8:10 AM Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz <appiazzolla at gmail.com>
>>> wrote:
>>>
>>>> Hello petsc-users,
>>>> I am trying to use a code that is parallelized with a combination of
>>>> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
>>>> processes.
>>>> I have carefully scheduled the processes such that the right amount is
>>>> launched, at the right time.
>>>> When trying to use my code inside a MatShell (for later use in an
>>>> FGMRES KSPSolver), MKL processes are not being used.
>>>>
>>>> I am sorry if this has been asked before.
>>>> What configuration should I use in order to profit from MPI+OpenMP+MKL
>>>> parallelism?
>>>>
>>>
>>> You should configure using --with-threadsafety
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Thank you!
>>>> --
>>>> Astor
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230407/1a1ada36/attachment-0001.html>