[petsc-users] MPI+OpenMP+MKL

Junchao Zhang junchao.zhang at gmail.com
Fri Apr 7 22:29:41 CDT 2023


I don't know OpenMP, but  I saw these in your configure

  OMP_PROC_BIND = 'TRUE'
  OMP_PLACES = '{0:24}'

Try not to do any binding and let OS freely schedule threads.

--Junchao Zhang


On Fri, Apr 7, 2023 at 7:17 PM Astor Piaz <appiazzolla at gmail.com> wrote:

> Thanks for your reply Matt.
>
> The problem seems to be the MKL threads I just realized.
>
> Inside the MatShell I call:
>
> call omp_set_nested(.true.)
> call omp_set_dynamic(.false.)
> call mkl_set_dynamic(0)
>
> Then, inside the omp single thread I use:
>
> nMkl0 = mkl_set_num_threads_local(nMkl)
>
> where nMkl is set to 24
>
> MKL_VERBOSE shows, that the calls to have access to 24 threads but the
> timings are the same as in 1 thread
>
> MKL_VERBOSE
> ZGEMV(N,12544,12544,0x7ffde9edc800,0x14e4662d2010,12544,0x14985e610,1,0x7ffde9edc7f0,0x189faaa90,1)
> 117.09ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:24
> MKL_VERBOSE
> ZGEMV(N,12544,12544,0x7ffe00355700,0x14c8ec1e4010,12544,0x16959c830,1,0x7ffe003556f0,0x17dd7da70,1)
> 117.37ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:1
>
> The configuration of OpenMP that is launching these MKL processes is as
> follows:
>
> OPENMP DISPLAY ENVIRONMENT BEGIN
>   _OPENMP = '201511'
>   OMP_DYNAMIC = 'FALSE'
>   OMP_NESTED = 'TRUE'
>   OMP_NUM_THREADS = '24'
>   OMP_SCHEDULE = 'DYNAMIC'
>   OMP_PROC_BIND = 'TRUE'
>   OMP_PLACES = '{0:24}'
>   OMP_STACKSIZE = '0'
>   OMP_WAIT_POLICY = 'PASSIVE'
>   OMP_THREAD_LIMIT = '4294967295'
>   OMP_MAX_ACTIVE_LEVELS = '255'
>   OMP_CANCELLATION = 'FALSE'
>   OMP_DEFAULT_DEVICE = '0'
>   OMP_MAX_TASK_PRIORITY = '0'
>   OMP_DISPLAY_AFFINITY = 'FALSE'
>   OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
>   OMP_ALLOCATOR = 'omp_default_mem_alloc'
>   OMP_TARGET_OFFLOAD = 'DEFAULT'
>   GOMP_CPU_AFFINITY = ''
>   GOMP_STACKSIZE = '0'
>   GOMP_SPINCOUNT = '300000'
> OPENMP DISPLAY ENVIRONMENT END
>
>
>
> On Fri, Apr 7, 2023 at 1:25 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Fri, Apr 7, 2023 at 2:26 PM Astor Piaz <appiazzolla at gmail.com> wrote:
>>
>>> Hi Matthew, Jungchau,
>>> Thank you for your advice. The code still does not work, I give more
>>> details about it below, I can specify more about it as you wish.
>>>
>>> I am implementing a spectral method resulting in a block matrix where
>>> the off-diagonal blocks are Poincare-Steklov operators of
>>> impedance-to-impedance type.
>>> Those Poincare-Steklov operators have been created hierarchically
>>> merging subdomain operators (the HPS method), and I have a well tuned (but
>>> rather complex) OpenMP+MKL code that can apply this operator very fast.
>>> I would like to use PETSc's MPI-parallel GMRES solver with a MatShell
>>> that calls my OpenMP+MKL code, while each block can be in a different MPI
>>> process.
>>>
>>> At the moment the code runs correctly, except that PETSc is not letting
>>> my OpenMP+MKL code make the scheduling of threads as I choose.
>>>
>>
>> PETSc does not say anything about OpenMP threads. However, maybe you need
>> to launch the executable with the correct OMP env variables?
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> I am using
>>> ./configure --with-scalar-type=complex --prefix=../install/fast/
>>> --with-debugging=0 -with-openmp=1 --with-blaslapack-dir=${MKLROOT}
>>> --with-mkl_cpardiso-dir=${MKLROOT} --with-threadsafety --with-log=0
>>> COPTFLAGS=-g -Ofast CXXOPTFLAGS=-g -Ofast FOPTFLAGS=-g -Ofast
>>>
>>> Attached is an image of htop showing that the MKL threads are indeed
>>> being spawn, but they remain unused by the code. The previous calculations
>>> on the code show that it is capable of using OpenMP and MKL, only when
>>> PETSC KSPSolver is called MKL seems to be turned off.
>>>
>>> On Fri, Apr 7, 2023 at 8:10 AM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz <appiazzolla at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello petsc-users,
>>>>> I am trying to use a code that is parallelized with a combination of
>>>>> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI
>>>>> processes.
>>>>> I have carefully scheduled the processes such that the right amount is
>>>>> launched, at the right time.
>>>>> When trying to use my code inside a MatShell (for later use in an
>>>>> FGMRES KSPSolver), MKL processes are not being used.
>>>>>
>>>>> I am sorry if this has been asked before.
>>>>> What configuration should I use in order to profit from MPI+OpenMP+MKL
>>>>> parallelism?
>>>>>
>>>>
>>>> You should configure using --with-threadsafety
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> Thank you!
>>>>> --
>>>>> Astor
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230407/b150a885/attachment-0001.html>


More information about the petsc-users mailing list