[petsc-users] Strange efficiency in PETSc-dev using OpenMP
Danyang Su
danyang.su at gmail.com
Mon Sep 23 13:33:45 CDT 2013
Hi Shri,
It seems that the problem does not result from the affinities setting
for threads. I have tried several settings, the threads are set to
different cores, but there is no improvement.
Here is the information of package, core and thread maps
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf
11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected:
{0,1,2,3,4,5,6,7,8,9,10,11}
OMP: Info #156: KMP_AFFINITY: 12 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 2 threads/core
(6 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 4 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 4 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 5 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 5 thread 1
OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost
levels of machine
And here is the internal thread bounding with different kmp_affinity
settings:
1. KMP_AFFINITY=verbose,granularity=thread,compact
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
2. KMP_AFFINITY=verbose,granularity=fine,compact
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}
3. KMP_AFFINITY=verbose,granularity=fine,compact,1,0
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6}
4. KMP_AFFINITY=verbose,scatter
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2,3}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4,5}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6,7}
5. KMP_AFFINITY=verbose,compact (For this setting, two threads are
assigned to the same core)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3}
6. KMP_AFFINITY=verbose,granularity=core,compact (For this setting, two
threads are assigned to the same core)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,3}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,3}
The first 4 settings can assign threads to a distinct core, but the
problem is not solved.
Thanks,
Danyang
On 22/09/2013 8:00 PM, Shri wrote:
> I think this is definitely an issue with setting the affinities for
> threads, i.e., the assignment of threads to cores. Ideally each thread
> should be assigned to a distinct core but in your case all the 4
> threads are getting pinned to the same core resulting in such a
> massive slowdown. Unfortunately, the thread affinities for OpenMP are
> set through environment variables. For Intel's OpenMP one needs to
> define the thread affinities through the environment variable
> KMP_AFFINITY. See this document here
> http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm.
> Try setting the affinities via KMP_AFFINITY and let us know if it works.
>
> Shri
> On Sep 21, 2013, at 11:06 PM, Danyang Su wrote:
>
>> Hi Shri,
>>
>> Thanks for your info. It can work with the option -threadcomm_type
>> openmp. But another problem arises, as described as follows.
>>
>> The sparse matrix is 53760*53760 with 1067392 non-zero entries. If
>> the codes is compiled using PETSc-3.4.2, it works fine, the equations
>> can be solved quickly and I can see the speedup. But if the code is
>> compiled using PETSc-dev with OpenMP option, it takes a long time in
>> solving the equations and I cannot see any speedup when more
>> processors are used.
>>
>> For PETSc-3.4.2, run by "mpiexec -n 4 ksp_inhm_d -log_summary
>> log_mpi4_petsc3.4.2.log", the iteration and runtime are:
>> Iterations 6 *time_assembly 0.4137E-01* time_ksp 0.9296E-01
>>
>> For PETSc-dev, run by "mpiexec -n 1 ksp_inhm_d -threadcomm_type
>> openmp -threadcomm_nthreads 4 -log_summary log_openmp_petsc_dev.log",
>> the iteration and runtime are:
>> Iterations 6 *time_assembly 0.3595E+03* time_ksp 0.2907E+00
>>
>> Most of the time *'time_assembly 0.3595E+03*' is spent on the
>> following codes
>> do i = istart, iend - 1
>> ii = ia_in(i+1)
>> jj = ia_in(i+2)
>> call MatSetValues(a, ione, i, jj-ii,
>> ja_in(ii:jj-1)-1, a_in(ii:jj-1), Insert_Values, ierr)
>> end do
>>
>> The log files for both PETSc-3.4.2 and PETSc-dev are attached.
>>
>> Is there anything wrong with my codes or with running option? The
>> above codes works fine when using MPICH.
>>
>> Thanks and regards,
>>
>> Danyang
>>
>> On 21/09/2013 2:09 PM, Shri wrote:
>>> There are three thread communicator types in PETSc. The default is
>>> "no thread" which is basically a non-threaded version. The other two
>>> types are "openmp" and "pthread". If you want to use OpenMP then use
>>> the option -threadcomm_type openmp.
>>>
>>> Shri
>>>
>>> On Sep 21, 2013, at 3:46 PM, Danyang Su <danyang.su at gmail.com
>>> <mailto:danyang.su at gmail.com>> wrote:
>>>
>>>> Hi Barry,
>>>>
>>>> Thanks for the quick reply.
>>>>
>>>> After changing
>>>> /#if defined(PETSC_HAVE_PTHREADCLASSES) || defined
>>>> (PETSC_HAVE_OPENMP) /
>>>> to
>>>> /#if defined(PETSC_HAVE_PTHREADCLASSES)/
>>>> and comment out
>>>> /#elif defined(PETSC_HAVE_OPENMP)//
>>>> //PETSC_EXTERN PetscStack *petscstack;/
>>>>
>>>> It can be compiled and validated with "make test".
>>>>
>>>> But I still have questions on running the examples. After rebuild
>>>> the codes (e.g., ksp_ex2f.f), I can run it with "mpiexec -n 1
>>>> ksp_ex2f", or "mpiexec -n 4 ksp_ex2f", or "mpiexec -n 1 ksp_ex2f
>>>> -threadcomm_nthreads 1", but if I run it with "mpiexec -n 1
>>>> ksp_ex2f -threadcomm_nthreads 4", there will be a lot of error
>>>> information (attached).
>>>>
>>>> The codes is not modified and there is no OpenMP routines in it.
>>>> For the current development in my project, I want to keep the
>>>> OpenMP codes in calculating matrix values, but want to solve it
>>>> with PETSc (OpenMP). Is it possible?
>>>>
>>>> Thanks and regards,
>>>>
>>>> Danyang
>>>>
>>>>
>>>>
>>>> On 21/09/2013 7:26 AM, Barry Smith wrote:
>>>>> Danyang,
>>>>>
>>>>> I don't think the || defined (PETSC_HAVE_OPENMP) belongs in the code below.
>>>>>
>>>>> /* Linux functions CPU_SET and others don't work if sched.h is not included before
>>>>> including pthread.h. Also, these functions are active only if either _GNU_SOURCE
>>>>> or __USE_GNU is not set (see /usr/include/sched.h and /usr/include/features.h), hence
>>>>> set these first.
>>>>> */
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>
>>>>> Edit include/petscerror.h and locate these lines and remove that part and then rerun make all. Let us know if it works or not.
>>>>>
>>>>> Barry
>>>>>
>>>>> i.e. replace
>>>>>
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES) || defined (PETSC_HAVE_OPENMP)
>>>>>
>>>>> with
>>>>>
>>>>> #if defined(PETSC_HAVE_PTHREADCLASSES)
>>>>>
>>>>> On Sep 21, 2013, at 6:53 AM, Matthew Knepley<petsc-maint at mcs.anl.gov> wrote:
>>>>>
>>>>>> On Sat, Sep 21, 2013 at 12:18 AM, Danyang Su<danyang.su at gmail.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I got error information in compiling petsc-dev with openmp in cygwin. Before, I have successfully compiled petsc-3.4.2 and it works fine.
>>>>>> The log files have been attached.
>>>>>>
>>>>>> The OpenMP configure test is wrong. It clearly fails to find pthread.h, but the test passes. Then in petscerror.h
>>>>>> we guard pthread.h using PETSC_HAVE_OPENMP. Can someone who knows OpenMP fix this?
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Danyang
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>> -- Norbert Wiener
>>>>
>>>> <error.txt>
>>
>> <log_mpi4_petsc3.4.2.log><log_openmp_petsc_dev.log>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130923/e9997efb/attachment-0001.html>
More information about the petsc-users
mailing list