[petsc-users] petsc4py help with parallel execution

Ivan Voznyuk ivan.voznyuk.work at gmail.com
Mon Nov 19 06:24:21 CST 2018


Hi guys,

So, I tried to configure petsc with the advised options, make all and make
tests went well.

However at the phase of python3 ./simple_test.py I obtained error after
error related to the import of MKL libraries. Solving them one after
another one, I finished with a critical violation error and abondeneed.

Instead, I reinstalled fully the system and done the following:
1. Install system's openmpi
2. Installed system's OpenBLAS
3. Configured petsc with mpi compilators, download mumps, scalapack, with
complex values
4. Made sure that in parameters there is nothing sequential
And it worked!!

So, thank you colleagues for
1. Helping me with definition of the problem
2. Your advices for solving this problem.

In close future I will try to do the same but with Intel MKL. Cuz
apperently, it may be much faster! And thus it represents a real interest!

So I ll try to do it and share with you!

Thanks,
Ivan

On Fri, Nov 16, 2018, 19:18 Dave May <dave.mayhem23 at gmail.com wrote:

>
>
> On Fri, 16 Nov 2018 at 19:02, Ivan Voznyuk <ivan.voznyuk.work at gmail.com>
> wrote:
>
>> Hi Satish,
>> Thanks for your reply.
>>
>> Bad news... I tested 2 solutions that you proposed, none has worked.
>>
>
> You don't still have
> OMP_NUM_THREADS=1
> set in your environment do you?
>
> Can you print the value of this env variable from within your python code
> and confirm it's not 1
>
>
>
>>
>> 1. --with-blaslapack-dir=/opt/intel/mkl
>> --with-mkl_pardiso-dir=/opt/intel/mkl installed well, without any problems.
>> However, the code is still turning in sequential way.
>> 2. When I changed -lmkl_sequential to -lmkl_intel_thread -liomp, he at
>> first did not find the liomp, so I had to create a symbolic link of libiomp5.so
>> to /lib.
>> At the launching of the .py code I had to go with:
>> export
>> LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_sequential.so
>> and
>> export LD_LIBRARY_PATH=/opt/petsc/petsc1/arch-linux2-c-debug/lib/
>>
>> But still it does not solve the given problem and code is still running
>> sequentially...
>>
>> May be you have some other ideas?
>>
>> Thanks,
>> Ivan
>>
>>
>>
>>
>> On Fri, Nov 16, 2018 at 6:11 PM Balay, Satish <balay at mcs.anl.gov> wrote:
>>
>>> Yes PETSc prefers sequential MKL - as MPI handles parallelism.
>>>
>>> One way to trick petsc configure to use threaded MKL is to enable
>>> pardiso. i.e:
>>>
>>> --with-blaslapack-dir=/opt/intel/mkl
>>> --with-mkl_pardiso-dir=/opt/intel/mkl
>>>
>>>
>>> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/11/15/configure_master_arch-pardiso_grind.log
>>>
>>> BLAS/LAPACK: -Wl,-rpath,/soft/com/packages/intel/16/u3/mkl/lib/intel64
>>> -L/soft/com/packages/intel/16/u3/mkl/lib/intel64 -lmkl_intel_lp64
>>> -lmkl_core -lmkl_intel_thread -liomp5 -ldl -lpthread
>>>
>>> Or you can manually specify the correct MKL library list [with
>>> threading] via --with-blaslapack-lib option.
>>>
>>> Satish
>>>
>>> On Fri, 16 Nov 2018, Ivan Voznyuk via petsc-users wrote:
>>>
>>> > Hi,
>>> > You were totally right: no miracle, parallelization does come from
>>> > multithreading. We checked Option 1/: played with OMP_NUM_THREADS=1 it
>>> > changed computational time.
>>> >
>>> > So, I reinstalled everything (starting with Ubuntu ending with petsc)
>>> and
>>> > configured the following things:
>>> >
>>> > - installed system's ompenmpi
>>> > - installed Intel MKL Blas / Lapack
>>> > - configured PETSC as ./configure --with-cc=mpicc --with-fc=mpif90
>>> > --with-cxx=mpicxx --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64
>>> > --download-scalapack --download-mumps --with-hwloc --with-shared
>>> > --with-openmp=1 --with-pthread=1 --with-scalar-type=complex
>>> > hoping that it would take into account blas multithreading
>>> > - installed petsc4py
>>> >
>>> > However, I do not get any parallelization...
>>> > What I tried to do so far unsuccessfully :
>>> > - play with OMP_NUM_THREADS
>>> > - reinstall the system
>>> > - ldd PETSc.cpython-35m-x86_64-linux-gnu.so yields lld_result.txt
>>> (here
>>> > attached)
>>> > I noted that libmkl_sequential.so library there. Do you think this is
>>> > normal?
>>> > - I found a similar problem reported here:
>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2016-March/028803.html
>>> To
>>> > solve this problem, developers recommended to replace -lmkl_sequential
>>> to
>>> > -lmkl_intel_thread options in PETSC_ARCH/lib/conf/petscvariables.
>>> However,
>>> > I did not find something that would be named like this (it might be a
>>> > change of version)
>>> > - Anyway, I replaced lmkl_sequential to lmkl_intel_thread in every
>>> file of
>>> > PETSC, but it changed nothing.
>>> >
>>> > As a result, in the new make.log (here attached ) I have a parameter
>>> > #define PETSC_HAVE_LIBMKL_SEQUENTIAL 1 and option -lmkl_sequential
>>> >
>>> > Do you have any idea of what I should change in the initial options in
>>> > order to obtain the blas multithreding parallelization?
>>> >
>>> > Thanks a lot for your help!
>>> >
>>> > Ivan
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Nov 16, 2018 at 1:25 AM Dave May <dave.mayhem23 at gmail.com>
>>> wrote:
>>> >
>>> > >
>>> > >
>>> > > On Thu, 15 Nov 2018 at 17:44, Ivan via petsc-users <
>>> > > petsc-users at mcs.anl.gov> wrote:
>>> > >
>>> > >> Hi Stefano,
>>> > >>
>>> > >> In fact, yes, we look at the htop output (and the resulting
>>> computational
>>> > >> time ofc).
>>> > >>
>>> > >> In our code we use MUMPS, which indeed depends on blas / lapack. So
>>> I
>>> > >> think this might be it!
>>> > >>
>>> > >> I will definetely check it (I mean the difference between our MUMPS,
>>> > >> blas, lapack).
>>> > >>
>>> > >> If you have an idea of how we can verify on his PC that the source
>>> of his
>>> > >> parallelization does come from BLAS, please do not hesitate to tell
>>> me!
>>> > >>
>>> > >
>>> > > Option 1/
>>> > > * Set this environment variable
>>> > >   export OMP_NUM_THREADS=1
>>> > > * Re-run your "parallel" test.
>>> > > * If the performance differs (job runs slower) compared with your
>>> previous
>>> > > run where you inferred parallelism was being employed, you can safely
>>> > > assume that the parallelism observed comes from threads
>>> > >
>>> > > Option 2/
>>> > > * Re-configure PETSc to use a known BLAS implementation which does
>>> not
>>> > > support threads
>>> > > * Re-compile PETSc
>>> > > * Re-run your parallel test
>>> > > * If the performance differs (job runs slower) compared with your
>>> previous
>>> > > run where you inferred parallelism was being employed, you can safely
>>> > > assume that the parallelism observed comes from threads
>>> > >
>>> > > Option 3/
>>> > > * Use a PC which does not depend on BLAS at all,
>>> > > e.g. -pc_type jacobi -pc_type bjacobi
>>> > > * If the performance differs (job runs slower) compared with your
>>> previous
>>> > > run where you inferred parallelism was being employed, you can safely
>>> > > assume that the parallelism observed comes from BLAS + threads
>>> > >
>>> > >
>>> > >
>>> > >> Thanks!
>>> > >>
>>> > >> Ivan
>>> > >> On 15/11/2018 18:24, Stefano Zampini wrote:
>>> > >>
>>> > >> If you say your program is parallel by just looking at the output
>>> from
>>> > >> the top command, you are probably linking against a multithreaded
>>> blas
>>> > >> library
>>> > >>
>>> > >> Il giorno Gio 15 Nov 2018, 20:09 Matthew Knepley via petsc-users <
>>> > >> petsc-users at mcs.anl.gov> ha scritto:
>>> > >>
>>> > >>> On Thu, Nov 15, 2018 at 11:59 AM Ivan Voznyuk <
>>> > >>> ivan.voznyuk.work at gmail.com> wrote:
>>> > >>>
>>> > >>>> Hi Matthew,
>>> > >>>>
>>> > >>>> Does it mean that by using just command python3 simple_code.py
>>> (without
>>> > >>>> mpiexec) you *cannot* obtain a parallel execution?
>>> > >>>>
>>> > >>>
>>> > >>> As I wrote before, its not impossible. You could be directly
>>> calling
>>> > >>> PMI, but I do not think you are doing that.
>>> > >>>
>>> > >>>
>>> > >>>> It s been 5 days we are trying to understand with my colleague
>>> how he
>>> > >>>> managed to do so.
>>> > >>>> It means that by using simply python3 simple_code.py he gets 8
>>> > >>>> processors workiing.
>>> > >>>> By the way, we wrote in his code few lines:
>>> > >>>> rank = PETSc.COMM_WORLD.Get_rank()
>>> > >>>> size = PETSc.COMM_WORLD.Get_size()
>>> > >>>> and we got rank = 0, size = 1
>>> > >>>>
>>> > >>>
>>> > >>> This is MPI telling you that you are only running on 1 processes.
>>> > >>>
>>> > >>>
>>> > >>>> However, we compilator arrives to KSP.solve(), somehow it turns
>>> on 8
>>> > >>>> processors.
>>> > >>>>
>>> > >>>
>>> > >>> Why do you think its running on 8 processes?
>>> > >>>
>>> > >>>
>>> > >>>> This problem is solved on his PC in 5-8 sec (in parallel, using
>>> *python3
>>> > >>>> simple_code.py*), on mine it takes 70-90 secs (in sequantial, but
>>> with
>>> > >>>> the same command *python3 simple_code.py*)
>>> > >>>>
>>> > >>>
>>> > >>> I think its much more likely that there are differences in the
>>> solver
>>> > >>> (use -ksp_view to see exactly what solver was used), then
>>> > >>> to think it is parallelism. Moreover, you would never ever ever
>>> see that
>>> > >>> much speedup on a laptop since all these computations
>>> > >>> are bandwidth limited.
>>> > >>>
>>> > >>>   Thanks,
>>> > >>>
>>> > >>>      Matt
>>> > >>>
>>> > >>>
>>> > >>>> So, conclusion is that on his computer this code works in the
>>> same way
>>> > >>>> as scipy: all the code is executed in sequantial mode, but when
>>> it comes to
>>> > >>>> solution of system of linear equations, it runs on all available
>>> > >>>> processors. All this with just running python3 my_code.py
>>> (without any
>>> > >>>> mpi-smth)
>>> > >>>>
>>> > >>>> Is it an exception / abnormal behavior? I mean, is it something
>>> > >>>> irregular that you, developers, have never seen?
>>> > >>>>
>>> > >>>> Thanks and have a good evening!
>>> > >>>> Ivan
>>> > >>>>
>>> > >>>> P.S. I don't think I know the answer regarding Scipy...
>>> > >>>>
>>> > >>>>
>>> > >>>> On Thu, Nov 15, 2018 at 2:39 PM Matthew Knepley <
>>> knepley at gmail.com>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> On Thu, Nov 15, 2018 at 8:07 AM Ivan Voznyuk <
>>> > >>>>> ivan.voznyuk.work at gmail.com> wrote:
>>> > >>>>>
>>> > >>>>>> Hi Matthew,
>>> > >>>>>> Thanks for your reply!
>>> > >>>>>>
>>> > >>>>>> Let me precise what I mean by defining few questions:
>>> > >>>>>>
>>> > >>>>>> 1. In order to obtain a parallel execution of simple_code.py,
>>> do I
>>> > >>>>>> need to go with mpiexec python3 simple_code.py, or I can just
>>> launch
>>> > >>>>>> python3 simple_code.py?
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>> mpiexec -n 2 python3 simple_code.py
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>> 2. This simple_code.py consists of 2 parts: a) preparation of
>>> matrix
>>> > >>>>>> b) solving the system of linear equations with PETSc. If I
>>> launch mpirun
>>> > >>>>>> (or mpiexec) -np 8 python3 simple_code.py, I suppose that I
>>> will basically
>>> > >>>>>> obtain 8 matrices and 8 systems to solve. However, I need to
>>> prepare only
>>> > >>>>>> one matrix, but launch this code in parallel on 8 processors.
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>> When you create the Mat object, you give it a communicator (here
>>> > >>>>> PETSC_COMM_WORLD). That allows us to distribute the data. This
>>> is all
>>> > >>>>> covered extensively in the manual and the online tutorials, as
>>> well as the
>>> > >>>>> example code.
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>> In fact, here attached you will find a similar code
>>> (scipy_code.py)
>>> > >>>>>> with only one difference: the system of linear equations is
>>> solved with
>>> > >>>>>> scipy. So when I solve it, I can clearly see that the solution
>>> is obtained
>>> > >>>>>> in a parallel way. However, I do not use the command mpirun (or
>>> mpiexec). I
>>> > >>>>>> just go with python3 scipy_code.py.
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>> Why do you think its running in parallel?
>>> > >>>>>
>>> > >>>>>   Thanks,
>>> > >>>>>
>>> > >>>>>      Matt
>>> > >>>>>
>>> > >>>>>
>>> > >>>>>> In this case, the first part (creation of the sparse matrix) is
>>> not
>>> > >>>>>> parallel, whereas the solution of system is found in a parallel
>>> way.
>>> > >>>>>> So my question is, Do you think that it s possible to have the
>>> same
>>> > >>>>>> behavior with PETSC? And what do I need for this?
>>> > >>>>>>
>>> > >>>>>> I am asking this because for my colleague it worked! It means
>>> that he
>>> > >>>>>> launches the simple_code.py on his computer using the command
>>> python3
>>> > >>>>>> simple_code.py (and not mpi-smth python3 simple_code.py) and he
>>> obtains a
>>> > >>>>>> parallel execution of the same code.
>>> > >>>>>>
>>> > >>>>>> Thanks for your help!
>>> > >>>>>> Ivan
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> On Thu, Nov 15, 2018 at 11:54 AM Matthew Knepley <
>>> knepley at gmail.com>
>>> > >>>>>> wrote:
>>> > >>>>>>
>>> > >>>>>>> On Thu, Nov 15, 2018 at 4:53 AM Ivan Voznyuk via petsc-users <
>>> > >>>>>>> petsc-users at mcs.anl.gov> wrote:
>>> > >>>>>>>
>>> > >>>>>>>> Dear PETSC community,
>>> > >>>>>>>>
>>> > >>>>>>>> I have a question regarding the parallel execution of
>>> petsc4py.
>>> > >>>>>>>>
>>> > >>>>>>>> I have a simple code (here attached simple_code.py) which
>>> solves a
>>> > >>>>>>>> system of linear equations Ax=b using petsc4py. To execute
>>> it, I use the
>>> > >>>>>>>> command python3 simple_code.py which yields a sequential
>>> performance. With
>>> > >>>>>>>> a colleague of my, we launched this code on his computer, and
>>> this time the
>>> > >>>>>>>> execution was in parallel. Although, he used the same command
>>> python3
>>> > >>>>>>>> simple_code.py (without mpirun, neither mpiexec).
>>> > >>>>>>>>
>>> > >>>>>>> I am not sure what you mean. To run MPI programs in parallel,
>>> you
>>> > >>>>>>> need a launcher like mpiexec or mpirun. There are Python
>>> programs (like
>>> > >>>>>>> nemesis) that use the launcher API directly (called PMI), but
>>> that is not
>>> > >>>>>>> part of petsc4py.
>>> > >>>>>>>
>>> > >>>>>>>   Thanks,
>>> > >>>>>>>
>>> > >>>>>>>      Matt
>>> > >>>>>>>
>>> > >>>>>>>> My configuration: Ubuntu x86_64 Ubuntu 16.04, Intel Core i7,
>>> PETSc
>>> > >>>>>>>> 3.10.2, PETSC_ARCH=arch-linux2-c-debug, petsc4py 3.10.0 in
>>> virtualenv
>>> > >>>>>>>>
>>> > >>>>>>>> In order to parallelize it, I have already tried:
>>> > >>>>>>>> - use 2 different PCs
>>> > >>>>>>>> - use Ubuntu 16.04, 18.04
>>> > >>>>>>>> - use different architectures (arch-linux2-c-debug,
>>> > >>>>>>>> linux-gnu-c-debug, etc)
>>> > >>>>>>>> - ofc use different configurations (my present config can be
>>> found
>>> > >>>>>>>> in make.log that I attached here)
>>> > >>>>>>>> - mpi from mpich, openmpi
>>> > >>>>>>>>
>>> > >>>>>>>> Nothing worked.
>>> > >>>>>>>>
>>> > >>>>>>>> Do you have any ideas?
>>> > >>>>>>>>
>>> > >>>>>>>> Thanks and have a good day,
>>> > >>>>>>>> Ivan
>>> > >>>>>>>>
>>> > >>>>>>>> --
>>> > >>>>>>>> Ivan VOZNYUK
>>> > >>>>>>>> PhD in Computational Electromagnetics
>>> > >>>>>>>>
>>> > >>>>>>>
>>> > >>>>>>>
>>> > >>>>>>> --
>>> > >>>>>>> What most experimenters take for granted before they begin
>>> their
>>> > >>>>>>> experiments is infinitely more interesting than any results to
>>> which their
>>> > >>>>>>> experiments lead.
>>> > >>>>>>> -- Norbert Wiener
>>> > >>>>>>>
>>> > >>>>>>> https://www.cse.buffalo.edu/~knepley/
>>> > >>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>> > >>>>>>>
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> --
>>> > >>>>>> Ivan VOZNYUK
>>> > >>>>>> PhD in Computational Electromagnetics
>>> > >>>>>> +33 (0)6.95.87.04.55
>>> > >>>>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>> > >>>>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>> > >>>>>>
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> --
>>> > >>>>> What most experimenters take for granted before they begin their
>>> > >>>>> experiments is infinitely more interesting than any results to
>>> which their
>>> > >>>>> experiments lead.
>>> > >>>>> -- Norbert Wiener
>>> > >>>>>
>>> > >>>>> https://www.cse.buffalo.edu/~knepley/
>>> > >>>>> <http://www.cse.buffalo.edu/~knepley/>
>>> > >>>>>
>>> > >>>>
>>> > >>>>
>>> > >>>> --
>>> > >>>> Ivan VOZNYUK
>>> > >>>> PhD in Computational Electromagnetics
>>> > >>>> +33 (0)6.95.87.04.55
>>> > >>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>> > >>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>> > >>>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> What most experimenters take for granted before they begin their
>>> > >>> experiments is infinitely more interesting than any results to
>>> which their
>>> > >>> experiments lead.
>>> > >>> -- Norbert Wiener
>>> > >>>
>>> > >>> https://www.cse.buffalo.edu/~knepley/
>>> > >>> <http://www.cse.buffalo.edu/~knepley/>
>>> > >>>
>>> > >>
>>> >
>>> >
>>>
>>>
>>
>> --
>> Ivan VOZNYUK
>> PhD in Computational Electromagnetics
>> +33 (0)6.95.87.04.55
>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181119/cfe8ecb3/attachment-0001.html>


More information about the petsc-users mailing list