[petsc-users] petsc4py help with parallel execution

Ivan Voznyuk ivan.voznyuk.work at gmail.com
Fri Nov 16 09:43:50 CST 2018


Hi,
You were totally right: no miracle, parallelization does come from
multithreading. We checked Option 1/: played with OMP_NUM_THREADS=1 it
changed computational time.

So, I reinstalled everything (starting with Ubuntu ending with petsc) and
configured the following things:

- installed system's ompenmpi
- installed Intel MKL Blas / Lapack
- configured PETSC as ./configure --with-cc=mpicc --with-fc=mpif90
--with-cxx=mpicxx --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64
--download-scalapack --download-mumps --with-hwloc --with-shared
--with-openmp=1 --with-pthread=1 --with-scalar-type=complex
hoping that it would take into account blas multithreading
- installed petsc4py

However, I do not get any parallelization...
What I tried to do so far unsuccessfully :
- play with OMP_NUM_THREADS
- reinstall the system
- ldd PETSc.cpython-35m-x86_64-linux-gnu.so yields lld_result.txt (here
attached)
I noted that libmkl_sequential.so library there. Do you think this is
normal?
- I found a similar problem reported here:
https://lists.mcs.anl.gov/pipermail/petsc-users/2016-March/028803.html To
solve this problem, developers recommended to replace -lmkl_sequential to
-lmkl_intel_thread options in PETSC_ARCH/lib/conf/petscvariables. However,
I did not find something that would be named like this (it might be a
change of version)
- Anyway, I replaced lmkl_sequential to lmkl_intel_thread in every file of
PETSC, but it changed nothing.

As a result, in the new make.log (here attached ) I have a parameter
#define PETSC_HAVE_LIBMKL_SEQUENTIAL 1 and option -lmkl_sequential

Do you have any idea of what I should change in the initial options in
order to obtain the blas multithreding parallelization?

Thanks a lot for your help!

Ivan






On Fri, Nov 16, 2018 at 1:25 AM Dave May <dave.mayhem23 at gmail.com> wrote:

>
>
> On Thu, 15 Nov 2018 at 17:44, Ivan via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi Stefano,
>>
>> In fact, yes, we look at the htop output (and the resulting computational
>> time ofc).
>>
>> In our code we use MUMPS, which indeed depends on blas / lapack. So I
>> think this might be it!
>>
>> I will definetely check it (I mean the difference between our MUMPS,
>> blas, lapack).
>>
>> If you have an idea of how we can verify on his PC that the source of his
>> parallelization does come from BLAS, please do not hesitate to tell me!
>>
>
> Option 1/
> * Set this environment variable
>   export OMP_NUM_THREADS=1
> * Re-run your "parallel" test.
> * If the performance differs (job runs slower) compared with your previous
> run where you inferred parallelism was being employed, you can safely
> assume that the parallelism observed comes from threads
>
> Option 2/
> * Re-configure PETSc to use a known BLAS implementation which does not
> support threads
> * Re-compile PETSc
> * Re-run your parallel test
> * If the performance differs (job runs slower) compared with your previous
> run where you inferred parallelism was being employed, you can safely
> assume that the parallelism observed comes from threads
>
> Option 3/
> * Use a PC which does not depend on BLAS at all,
> e.g. -pc_type jacobi -pc_type bjacobi
> * If the performance differs (job runs slower) compared with your previous
> run where you inferred parallelism was being employed, you can safely
> assume that the parallelism observed comes from BLAS + threads
>
>
>
>> Thanks!
>>
>> Ivan
>> On 15/11/2018 18:24, Stefano Zampini wrote:
>>
>> If you say your program is parallel by just looking at the output from
>> the top command, you are probably linking against a multithreaded blas
>> library
>>
>> Il giorno Gio 15 Nov 2018, 20:09 Matthew Knepley via petsc-users <
>> petsc-users at mcs.anl.gov> ha scritto:
>>
>>> On Thu, Nov 15, 2018 at 11:59 AM Ivan Voznyuk <
>>> ivan.voznyuk.work at gmail.com> wrote:
>>>
>>>> Hi Matthew,
>>>>
>>>> Does it mean that by using just command python3 simple_code.py (without
>>>> mpiexec) you *cannot* obtain a parallel execution?
>>>>
>>>
>>> As I wrote before, its not impossible. You could be directly calling
>>> PMI, but I do not think you are doing that.
>>>
>>>
>>>> It s been 5 days we are trying to understand with my colleague how he
>>>> managed to do so.
>>>> It means that by using simply python3 simple_code.py he gets 8
>>>> processors workiing.
>>>> By the way, we wrote in his code few lines:
>>>> rank = PETSc.COMM_WORLD.Get_rank()
>>>> size = PETSc.COMM_WORLD.Get_size()
>>>> and we got rank = 0, size = 1
>>>>
>>>
>>> This is MPI telling you that you are only running on 1 processes.
>>>
>>>
>>>> However, we compilator arrives to KSP.solve(), somehow it turns on 8
>>>> processors.
>>>>
>>>
>>> Why do you think its running on 8 processes?
>>>
>>>
>>>> This problem is solved on his PC in 5-8 sec (in parallel, using *python3
>>>> simple_code.py*), on mine it takes 70-90 secs (in sequantial, but with
>>>> the same command *python3 simple_code.py*)
>>>>
>>>
>>> I think its much more likely that there are differences in the solver
>>> (use -ksp_view to see exactly what solver was used), then
>>> to think it is parallelism. Moreover, you would never ever ever see that
>>> much speedup on a laptop since all these computations
>>> are bandwidth limited.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> So, conclusion is that on his computer this code works in the same way
>>>> as scipy: all the code is executed in sequantial mode, but when it comes to
>>>> solution of system of linear equations, it runs on all available
>>>> processors. All this with just running python3 my_code.py (without any
>>>> mpi-smth)
>>>>
>>>> Is it an exception / abnormal behavior? I mean, is it something
>>>> irregular that you, developers, have never seen?
>>>>
>>>> Thanks and have a good evening!
>>>> Ivan
>>>>
>>>> P.S. I don't think I know the answer regarding Scipy...
>>>>
>>>>
>>>> On Thu, Nov 15, 2018 at 2:39 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Nov 15, 2018 at 8:07 AM Ivan Voznyuk <
>>>>> ivan.voznyuk.work at gmail.com> wrote:
>>>>>
>>>>>> Hi Matthew,
>>>>>> Thanks for your reply!
>>>>>>
>>>>>> Let me precise what I mean by defining few questions:
>>>>>>
>>>>>> 1. In order to obtain a parallel execution of simple_code.py, do I
>>>>>> need to go with mpiexec python3 simple_code.py, or I can just launch
>>>>>> python3 simple_code.py?
>>>>>>
>>>>>
>>>>> mpiexec -n 2 python3 simple_code.py
>>>>>
>>>>>
>>>>>> 2. This simple_code.py consists of 2 parts: a) preparation of matrix
>>>>>> b) solving the system of linear equations with PETSc. If I launch mpirun
>>>>>> (or mpiexec) -np 8 python3 simple_code.py, I suppose that I will basically
>>>>>> obtain 8 matrices and 8 systems to solve. However, I need to prepare only
>>>>>> one matrix, but launch this code in parallel on 8 processors.
>>>>>>
>>>>>
>>>>> When you create the Mat object, you give it a communicator (here
>>>>> PETSC_COMM_WORLD). That allows us to distribute the data. This is all
>>>>> covered extensively in the manual and the online tutorials, as well as the
>>>>> example code.
>>>>>
>>>>>
>>>>>> In fact, here attached you will find a similar code (scipy_code.py)
>>>>>> with only one difference: the system of linear equations is solved with
>>>>>> scipy. So when I solve it, I can clearly see that the solution is obtained
>>>>>> in a parallel way. However, I do not use the command mpirun (or mpiexec). I
>>>>>> just go with python3 scipy_code.py.
>>>>>>
>>>>>
>>>>> Why do you think its running in parallel?
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>>> In this case, the first part (creation of the sparse matrix) is not
>>>>>> parallel, whereas the solution of system is found in a parallel way.
>>>>>> So my question is, Do you think that it s possible to have the same
>>>>>> behavior with PETSC? And what do I need for this?
>>>>>>
>>>>>> I am asking this because for my colleague it worked! It means that he
>>>>>> launches the simple_code.py on his computer using the command python3
>>>>>> simple_code.py (and not mpi-smth python3 simple_code.py) and he obtains a
>>>>>> parallel execution of the same code.
>>>>>>
>>>>>> Thanks for your help!
>>>>>> Ivan
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 15, 2018 at 11:54 AM Matthew Knepley <knepley at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Nov 15, 2018 at 4:53 AM Ivan Voznyuk via petsc-users <
>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>>> Dear PETSC community,
>>>>>>>>
>>>>>>>> I have a question regarding the parallel execution of petsc4py.
>>>>>>>>
>>>>>>>> I have a simple code (here attached simple_code.py) which solves a
>>>>>>>> system of linear equations Ax=b using petsc4py. To execute it, I use the
>>>>>>>> command python3 simple_code.py which yields a sequential performance. With
>>>>>>>> a colleague of my, we launched this code on his computer, and this time the
>>>>>>>> execution was in parallel. Although, he used the same command python3
>>>>>>>> simple_code.py (without mpirun, neither mpiexec).
>>>>>>>>
>>>>>>> I am not sure what you mean. To run MPI programs in parallel, you
>>>>>>> need a launcher like mpiexec or mpirun. There are Python programs (like
>>>>>>> nemesis) that use the launcher API directly (called PMI), but that is not
>>>>>>> part of petsc4py.
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>
>>>>>>>      Matt
>>>>>>>
>>>>>>>> My configuration: Ubuntu x86_64 Ubuntu 16.04, Intel Core i7, PETSc
>>>>>>>> 3.10.2, PETSC_ARCH=arch-linux2-c-debug, petsc4py 3.10.0 in virtualenv
>>>>>>>>
>>>>>>>> In order to parallelize it, I have already tried:
>>>>>>>> - use 2 different PCs
>>>>>>>> - use Ubuntu 16.04, 18.04
>>>>>>>> - use different architectures (arch-linux2-c-debug,
>>>>>>>> linux-gnu-c-debug, etc)
>>>>>>>> - ofc use different configurations (my present config can be found
>>>>>>>> in make.log that I attached here)
>>>>>>>> - mpi from mpich, openmpi
>>>>>>>>
>>>>>>>> Nothing worked.
>>>>>>>>
>>>>>>>> Do you have any ideas?
>>>>>>>>
>>>>>>>> Thanks and have a good day,
>>>>>>>> Ivan
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ivan VOZNYUK
>>>>>>>> PhD in Computational Electromagnetics
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ivan VOZNYUK
>>>>>> PhD in Computational Electromagnetics
>>>>>> +33 (0)6.95.87.04.55
>>>>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>>>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ivan VOZNYUK
>>>> PhD in Computational Electromagnetics
>>>> +33 (0)6.95.87.04.55
>>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>

-- 
Ivan VOZNYUK
PhD in Computational Electromagnetics
+33 (0)6.95.87.04.55
My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181116/a4ff5521/attachment-0001.html>
-------------- next part --------------
	linux-vdso.so.1 =>  (0x00007ffd5d7c5000)
	/opt/intel/mkl/lib/intel64/libmkl_core.so (0x00007fee66886000)
	/opt/intel/mkl/lib/intel64/libmkl_sequential.so (0x00007fee652da000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fee650bd000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fee64cf3000)
	libpetsc.so.3.10 => /opt/petsc/petsc-3.10.2/arch-linux2-c-debug/lib/libpetsc.so.3.10 (0x00007fee6292d000)
	libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007fee62657000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fee62453000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fee6af61000)
	libmkl_intel_lp64.so => /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007fee61905000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fee615fc000)
	libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007fee613c2000)
	libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007fee61169000)
	libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fee60e3e000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fee60c28000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fee60a06000)
	libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007fee607f7000)
	libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007fee6057d000)
	libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007fee602e0000)
	libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fee600d5000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fee5fecb000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fee5fc8c000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fee5fa84000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fee5f881000)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: text/x-log
Size: 97813 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181116/a4ff5521/attachment-0001.bin>


More information about the petsc-users mailing list