[petsc-users] petsc4py help with parallel execution

Mon Nov 19 06:30:19 CST 2018

On Mon, Nov 19, 2018 at 7:25 AM Ivan Voznyuk via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi guys,
>
> So, I tried to configure petsc with the advised options, make all and make
> tests went well.
>
> However at the phase of python3 ./simple_test.py I obtained error after
> error related to the import of MKL libraries. Solving them one after
> another one, I finished with a critical violation error and abondeneed.
>
> Instead, I reinstalled fully the system and done the following:
> 1. Install system's openmpi
> 2. Installed system's OpenBLAS
> 3. Configured petsc with mpi compilators, download mumps, scalapack, with
> complex values
> 4. Made sure that in parameters there is nothing sequential
> And it worked!!
>
> So, thank you colleagues for
> 1. Helping me with definition of the problem
> 2. Your advices for solving this problem.
>
> In close future I will try to do the same but with Intel MKL. Cuz
> apperently, it may be much faster! And thus it represents a real interest!
>

I would be careful with this conclusion. All our tests, and everything I
have ever read, indicates that it is not faster than
just actually running in parallel, mpiexec -n <p>

   Matt

> So I ll try to do it and share with you!
>
> Thanks,
> Ivan
>
> On Fri, Nov 16, 2018, 19:18 Dave May <dave.mayhem23 at gmail.com wrote:
>
>>
>>
>> On Fri, 16 Nov 2018 at 19:02, Ivan Voznyuk <ivan.voznyuk.work at gmail.com>
>> wrote:
>>
>>> Hi Satish,
>>> Thanks for your reply.
>>>
>>> Bad news... I tested 2 solutions that you proposed, none has worked.
>>>
>>
>> You don't still have
>> OMP_NUM_THREADS=1
>> set in your environment do you?
>>
>> Can you print the value of this env variable from within your python code
>> and confirm it's not 1
>>
>>
>>
>>>
>>> 1. --with-blaslapack-dir=/opt/intel/mkl
>>> --with-mkl_pardiso-dir=/opt/intel/mkl installed well, without any problems.
>>> However, the code is still turning in sequential way.
>>> 2. When I changed -lmkl_sequential to -lmkl_intel_thread -liomp, he at
>>> first did not find the liomp, so I had to create a symbolic link of libiomp5.so
>>> to /lib.
>>> At the launching of the .py code I had to go with:
>>> export
>>> LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_sequential.so
>>> and
>>> export LD_LIBRARY_PATH=/opt/petsc/petsc1/arch-linux2-c-debug/lib/
>>>
>>> But still it does not solve the given problem and code is still running
>>> sequentially...
>>>
>>> May be you have some other ideas?
>>>
>>> Thanks,
>>> Ivan
>>>
>>>
>>>
>>>
>>> On Fri, Nov 16, 2018 at 6:11 PM Balay, Satish <balay at mcs.anl.gov> wrote:
>>>
>>>> Yes PETSc prefers sequential MKL - as MPI handles parallelism.
>>>>
>>>> One way to trick petsc configure to use threaded MKL is to enable
>>>> pardiso. i.e:
>>>>
>>>> --with-blaslapack-dir=/opt/intel/mkl
>>>> --with-mkl_pardiso-dir=/opt/intel/mkl
>>>>
>>>>
>>>> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/11/15/configure_master_arch-pardiso_grind.log
>>>>
>>>> BLAS/LAPACK: -Wl,-rpath,/soft/com/packages/intel/16/u3/mkl/lib/intel64
>>>> -L/soft/com/packages/intel/16/u3/mkl/lib/intel64 -lmkl_intel_lp64
>>>> -lmkl_core -lmkl_intel_thread -liomp5 -ldl -lpthread
>>>>
>>>> Or you can manually specify the correct MKL library list [with
>>>> threading] via --with-blaslapack-lib option.
>>>>
>>>> Satish
>>>>
>>>> On Fri, 16 Nov 2018, Ivan Voznyuk via petsc-users wrote:
>>>>
>>>> > Hi,
>>>> > You were totally right: no miracle, parallelization does come from
>>>> > multithreading. We checked Option 1/: played with OMP_NUM_THREADS=1 it
>>>> > changed computational time.
>>>> >
>>>> > So, I reinstalled everything (starting with Ubuntu ending with petsc)
>>>> and
>>>> > configured the following things:
>>>> >
>>>> > - installed system's ompenmpi
>>>> > - installed Intel MKL Blas / Lapack
>>>> > - configured PETSC as ./configure --with-cc=mpicc --with-fc=mpif90
>>>> > --with-cxx=mpicxx --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64
>>>> > --download-scalapack --download-mumps --with-hwloc --with-shared
>>>> > --with-openmp=1 --with-pthread=1 --with-scalar-type=complex
>>>> > hoping that it would take into account blas multithreading
>>>> > - installed petsc4py
>>>> >
>>>> > However, I do not get any parallelization...
>>>> > What I tried to do so far unsuccessfully :
>>>> > - play with OMP_NUM_THREADS
>>>> > - reinstall the system
>>>> > - ldd PETSc.cpython-35m-x86_64-linux-gnu.so yields lld_result.txt
>>>> (here
>>>> > attached)
>>>> > I noted that libmkl_sequential.so library there. Do you think this is
>>>> > normal?
>>>> > - I found a similar problem reported here:
>>>> >
>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2016-March/028803.html
>>>> To
>>>> > solve this problem, developers recommended to replace
>>>> -lmkl_sequential to
>>>> > -lmkl_intel_thread options in PETSC_ARCH/lib/conf/petscvariables.
>>>> However,
>>>> > I did not find something that would be named like this (it might be a
>>>> > change of version)
>>>> > - Anyway, I replaced lmkl_sequential to lmkl_intel_thread in every
>>>> file of
>>>> > PETSC, but it changed nothing.
>>>> >
>>>> > As a result, in the new make.log (here attached ) I have a parameter
>>>> > #define PETSC_HAVE_LIBMKL_SEQUENTIAL 1 and option -lmkl_sequential
>>>> >
>>>> > Do you have any idea of what I should change in the initial options in
>>>> > order to obtain the blas multithreding parallelization?
>>>> >
>>>> > Thanks a lot for your help!
>>>> >
>>>> > Ivan
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Nov 16, 2018 at 1:25 AM Dave May <dave.mayhem23 at gmail.com>
>>>> wrote:
>>>> >
>>>> > >
>>>> > >
>>>> > > On Thu, 15 Nov 2018 at 17:44, Ivan via petsc-users <
>>>> > > petsc-users at mcs.anl.gov> wrote:
>>>> > >
>>>> > >> Hi Stefano,
>>>> > >>
>>>> > >> In fact, yes, we look at the htop output (and the resulting
>>>> computational
>>>> > >> time ofc).
>>>> > >>
>>>> > >> In our code we use MUMPS, which indeed depends on blas / lapack.
>>>> So I
>>>> > >> think this might be it!
>>>> > >>
>>>> > >> I will definetely check it (I mean the difference between our
>>>> MUMPS,
>>>> > >> blas, lapack).
>>>> > >>
>>>> > >> If you have an idea of how we can verify on his PC that the source
>>>> of his
>>>> > >> parallelization does come from BLAS, please do not hesitate to
>>>> tell me!
>>>> > >>
>>>> > >
>>>> > > Option 1/
>>>> > > * Set this environment variable
>>>> > >   export OMP_NUM_THREADS=1
>>>> > > * Re-run your "parallel" test.
>>>> > > * If the performance differs (job runs slower) compared with your
>>>> previous
>>>> > > run where you inferred parallelism was being employed, you can
>>>> safely
>>>> > > assume that the parallelism observed comes from threads
>>>> > >
>>>> > > Option 2/
>>>> > > * Re-configure PETSc to use a known BLAS implementation which does
>>>> not
>>>> > > support threads
>>>> > > * Re-compile PETSc
>>>> > > * Re-run your parallel test
>>>> > > * If the performance differs (job runs slower) compared with your
>>>> previous
>>>> > > run where you inferred parallelism was being employed, you can
>>>> safely
>>>> > > assume that the parallelism observed comes from threads
>>>> > >
>>>> > > Option 3/
>>>> > > * Use a PC which does not depend on BLAS at all,
>>>> > > e.g. -pc_type jacobi -pc_type bjacobi
>>>> > > * If the performance differs (job runs slower) compared with your
>>>> previous
>>>> > > run where you inferred parallelism was being employed, you can
>>>> safely
>>>> > > assume that the parallelism observed comes from BLAS + threads
>>>> > >
>>>> > >
>>>> > >
>>>> > >> Thanks!
>>>> > >>
>>>> > >> Ivan
>>>> > >> On 15/11/2018 18:24, Stefano Zampini wrote:
>>>> > >>
>>>> > >> If you say your program is parallel by just looking at the output
>>>> from
>>>> > >> the top command, you are probably linking against a multithreaded
>>>> blas
>>>> > >> library
>>>> > >>
>>>> > >> Il giorno Gio 15 Nov 2018, 20:09 Matthew Knepley via petsc-users <
>>>> > >> petsc-users at mcs.anl.gov> ha scritto:
>>>> > >>
>>>> > >>> On Thu, Nov 15, 2018 at 11:59 AM Ivan Voznyuk <
>>>> > >>> ivan.voznyuk.work at gmail.com> wrote:
>>>> > >>>
>>>> > >>>> Hi Matthew,
>>>> > >>>>
>>>> > >>>> Does it mean that by using just command python3 simple_code.py
>>>> (without
>>>> > >>>> mpiexec) you *cannot* obtain a parallel execution?
>>>> > >>>>
>>>> > >>>
>>>> > >>> As I wrote before, its not impossible. You could be directly
>>>> calling
>>>> > >>> PMI, but I do not think you are doing that.
>>>> > >>>
>>>> > >>>
>>>> > >>>> It s been 5 days we are trying to understand with my colleague
>>>> how he
>>>> > >>>> managed to do so.
>>>> > >>>> It means that by using simply python3 simple_code.py he gets 8
>>>> > >>>> processors workiing.
>>>> > >>>> By the way, we wrote in his code few lines:
>>>> > >>>> rank = PETSc.COMM_WORLD.Get_rank()
>>>> > >>>> size = PETSc.COMM_WORLD.Get_size()
>>>> > >>>> and we got rank = 0, size = 1
>>>> > >>>>
>>>> > >>>
>>>> > >>> This is MPI telling you that you are only running on 1 processes.
>>>> > >>>
>>>> > >>>
>>>> > >>>> However, we compilator arrives to KSP.solve(), somehow it turns
>>>> on 8
>>>> > >>>> processors.
>>>> > >>>>
>>>> > >>>
>>>> > >>> Why do you think its running on 8 processes?
>>>> > >>>
>>>> > >>>
>>>> > >>>> This problem is solved on his PC in 5-8 sec (in parallel, using
>>>> *python3
>>>> > >>>> simple_code.py*), on mine it takes 70-90 secs (in sequantial,
>>>> but with
>>>> > >>>> the same command *python3 simple_code.py*)
>>>> > >>>>
>>>> > >>>
>>>> > >>> I think its much more likely that there are differences in the
>>>> solver
>>>> > >>> (use -ksp_view to see exactly what solver was used), then
>>>> > >>> to think it is parallelism. Moreover, you would never ever ever
>>>> see that
>>>> > >>> much speedup on a laptop since all these computations
>>>> > >>> are bandwidth limited.
>>>> > >>>
>>>> > >>>   Thanks,
>>>> > >>>
>>>> > >>>      Matt
>>>> > >>>
>>>> > >>>
>>>> > >>>> So, conclusion is that on his computer this code works in the
>>>> same way
>>>> > >>>> as scipy: all the code is executed in sequantial mode, but when
>>>> it comes to
>>>> > >>>> solution of system of linear equations, it runs on all available
>>>> > >>>> processors. All this with just running python3 my_code.py
>>>> (without any
>>>> > >>>> mpi-smth)
>>>> > >>>>
>>>> > >>>> Is it an exception / abnormal behavior? I mean, is it something
>>>> > >>>> irregular that you, developers, have never seen?
>>>> > >>>>
>>>> > >>>> Thanks and have a good evening!
>>>> > >>>> Ivan
>>>> > >>>>
>>>> > >>>> P.S. I don't think I know the answer regarding Scipy...
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> On Thu, Nov 15, 2018 at 2:39 PM Matthew Knepley <
>>>> knepley at gmail.com>
>>>> > >>>> wrote:
>>>> > >>>>
>>>> > >>>>> On Thu, Nov 15, 2018 at 8:07 AM Ivan Voznyuk <
>>>> > >>>>> ivan.voznyuk.work at gmail.com> wrote:
>>>> > >>>>>
>>>> > >>>>>> Hi Matthew,
>>>> > >>>>>> Thanks for your reply!
>>>> > >>>>>>
>>>> > >>>>>> Let me precise what I mean by defining few questions:
>>>> > >>>>>>
>>>> > >>>>>> 1. In order to obtain a parallel execution of simple_code.py,
>>>> do I
>>>> > >>>>>> need to go with mpiexec python3 simple_code.py, or I can just
>>>> launch
>>>> > >>>>>> python3 simple_code.py?
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>> mpiexec -n 2 python3 simple_code.py
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>>> 2. This simple_code.py consists of 2 parts: a) preparation of
>>>> matrix
>>>> > >>>>>> b) solving the system of linear equations with PETSc. If I
>>>> launch mpirun
>>>> > >>>>>> (or mpiexec) -np 8 python3 simple_code.py, I suppose that I
>>>> will basically
>>>> > >>>>>> obtain 8 matrices and 8 systems to solve. However, I need to
>>>> prepare only
>>>> > >>>>>> one matrix, but launch this code in parallel on 8 processors.
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>> When you create the Mat object, you give it a communicator (here
>>>> > >>>>> PETSC_COMM_WORLD). That allows us to distribute the data. This
>>>> is all
>>>> > >>>>> covered extensively in the manual and the online tutorials, as
>>>> well as the
>>>> > >>>>> example code.
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>>> In fact, here attached you will find a similar code
>>>> (scipy_code.py)
>>>> > >>>>>> with only one difference: the system of linear equations is
>>>> solved with
>>>> > >>>>>> scipy. So when I solve it, I can clearly see that the solution
>>>> is obtained
>>>> > >>>>>> in a parallel way. However, I do not use the command mpirun
>>>> (or mpiexec). I
>>>> > >>>>>> just go with python3 scipy_code.py.
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>> Why do you think its running in parallel?
>>>> > >>>>>
>>>> > >>>>>   Thanks,
>>>> > >>>>>
>>>> > >>>>>      Matt
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>>> In this case, the first part (creation of the sparse matrix)
>>>> is not
>>>> > >>>>>> parallel, whereas the solution of system is found in a
>>>> parallel way.
>>>> > >>>>>> So my question is, Do you think that it s possible to have the
>>>> same
>>>> > >>>>>> behavior with PETSC? And what do I need for this?
>>>> > >>>>>>
>>>> > >>>>>> I am asking this because for my colleague it worked! It means
>>>> that he
>>>> > >>>>>> launches the simple_code.py on his computer using the command
>>>> python3
>>>> > >>>>>> simple_code.py (and not mpi-smth python3 simple_code.py) and
>>>> he obtains a
>>>> > >>>>>> parallel execution of the same code.
>>>> > >>>>>>
>>>> > >>>>>> Thanks for your help!
>>>> > >>>>>> Ivan
>>>> > >>>>>>
>>>> > >>>>>>
>>>> > >>>>>> On Thu, Nov 15, 2018 at 11:54 AM Matthew Knepley <
>>>> knepley at gmail.com>
>>>> > >>>>>> wrote:
>>>> > >>>>>>
>>>> > >>>>>>> On Thu, Nov 15, 2018 at 4:53 AM Ivan Voznyuk via petsc-users <
>>>> > >>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>>> Dear PETSC community,
>>>> > >>>>>>>>
>>>> > >>>>>>>> I have a question regarding the parallel execution of
>>>> petsc4py.
>>>> > >>>>>>>>
>>>> > >>>>>>>> I have a simple code (here attached simple_code.py) which
>>>> solves a
>>>> > >>>>>>>> system of linear equations Ax=b using petsc4py. To execute
>>>> it, I use the
>>>> > >>>>>>>> command python3 simple_code.py which yields a sequential
>>>> performance. With
>>>> > >>>>>>>> a colleague of my, we launched this code on his computer,
>>>> and this time the
>>>> > >>>>>>>> execution was in parallel. Although, he used the same
>>>> command python3
>>>> > >>>>>>>> simple_code.py (without mpirun, neither mpiexec).
>>>> > >>>>>>>>
>>>> > >>>>>>> I am not sure what you mean. To run MPI programs in parallel,
>>>> you
>>>> > >>>>>>> need a launcher like mpiexec or mpirun. There are Python
>>>> programs (like
>>>> > >>>>>>> nemesis) that use the launcher API directly (called PMI), but
>>>> that is not
>>>> > >>>>>>> part of petsc4py.
>>>> > >>>>>>>
>>>> > >>>>>>>   Thanks,
>>>> > >>>>>>>
>>>> > >>>>>>>      Matt
>>>> > >>>>>>>
>>>> > >>>>>>>> My configuration: Ubuntu x86_64 Ubuntu 16.04, Intel Core i7,
>>>> PETSc
>>>> > >>>>>>>> 3.10.2, PETSC_ARCH=arch-linux2-c-debug, petsc4py 3.10.0 in
>>>> virtualenv
>>>> > >>>>>>>>
>>>> > >>>>>>>> In order to parallelize it, I have already tried:
>>>> > >>>>>>>> - use 2 different PCs
>>>> > >>>>>>>> - use Ubuntu 16.04, 18.04
>>>> > >>>>>>>> - use different architectures (arch-linux2-c-debug,
>>>> > >>>>>>>> linux-gnu-c-debug, etc)
>>>> > >>>>>>>> - ofc use different configurations (my present config can be
>>>> found
>>>> > >>>>>>>> in make.log that I attached here)
>>>> > >>>>>>>> - mpi from mpich, openmpi
>>>> > >>>>>>>>
>>>> > >>>>>>>> Nothing worked.
>>>> > >>>>>>>>
>>>> > >>>>>>>> Do you have any ideas?
>>>> > >>>>>>>>
>>>> > >>>>>>>> Thanks and have a good day,
>>>> > >>>>>>>> Ivan
>>>> > >>>>>>>>
>>>> > >>>>>>>> --
>>>> > >>>>>>>> Ivan VOZNYUK
>>>> > >>>>>>>> PhD in Computational Electromagnetics
>>>> > >>>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>>
>>>> > >>>>>>> --
>>>> > >>>>>>> What most experimenters take for granted before they begin
>>>> their
>>>> > >>>>>>> experiments is infinitely more interesting than any results
>>>> to which their
>>>> > >>>>>>> experiments lead.
>>>> > >>>>>>> -- Norbert Wiener
>>>> > >>>>>>>
>>>> > >>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>> > >>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>> > >>>>>>>
>>>> > >>>>>>
>>>> > >>>>>>
>>>> > >>>>>> --
>>>> > >>>>>> Ivan VOZNYUK
>>>> > >>>>>> PhD in Computational Electromagnetics
>>>> > >>>>>> +33 (0)6.95.87.04.55
>>>> > >>>>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>>> > >>>>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>> > >>>>>>
>>>> > >>>>>
>>>> > >>>>>
>>>> > >>>>> --
>>>> > >>>>> What most experimenters take for granted before they begin their
>>>> > >>>>> experiments is infinitely more interesting than any results to
>>>> which their
>>>> > >>>>> experiments lead.
>>>> > >>>>> -- Norbert Wiener
>>>> > >>>>>
>>>> > >>>>> https://www.cse.buffalo.edu/~knepley/
>>>> > >>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>> > >>>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> --
>>>> > >>>> Ivan VOZNYUK
>>>> > >>>> PhD in Computational Electromagnetics
>>>> > >>>> +33 (0)6.95.87.04.55
>>>> > >>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>>> > >>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>> > >>>>
>>>> > >>>
>>>> > >>>
>>>> > >>> --
>>>> > >>> What most experimenters take for granted before they begin their
>>>> > >>> experiments is infinitely more interesting than any results to
>>>> which their
>>>> > >>> experiments lead.
>>>> > >>> -- Norbert Wiener
>>>> > >>>
>>>> > >>> https://www.cse.buffalo.edu/~knepley/
>>>> > >>> <http://www.cse.buffalo.edu/~knepley/>
>>>> > >>>
>>>> > >>
>>>> >
>>>> >
>>>>
>>>>
>>>
>>> --
>>> Ivan VOZNYUK
>>> PhD in Computational Electromagnetics
>>> +33 (0)6.95.87.04.55
>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181119/6c143989/attachment-0001.html>