[petsc-users] petsc4py help with parallel execution

Thu Nov 15 18:25:09 CST 2018

On Thu, 15 Nov 2018 at 17:44, Ivan via petsc-users <petsc-users at mcs.anl.gov>
wrote:

> Hi Stefano,
>
> In fact, yes, we look at the htop output (and the resulting computational
> time ofc).
>
> In our code we use MUMPS, which indeed depends on blas / lapack. So I
> think this might be it!
>
> I will definetely check it (I mean the difference between our MUMPS, blas,
> lapack).
>
> If you have an idea of how we can verify on his PC that the source of his
> parallelization does come from BLAS, please do not hesitate to tell me!
>

Option 1/
* Set this environment variable
  export OMP_NUM_THREADS=1
* Re-run your "parallel" test.
* If the performance differs (job runs slower) compared with your previous
run where you inferred parallelism was being employed, you can safely
assume that the parallelism observed comes from threads

Option 2/
* Re-configure PETSc to use a known BLAS implementation which does not
support threads
* Re-compile PETSc
* Re-run your parallel test
* If the performance differs (job runs slower) compared with your previous
run where you inferred parallelism was being employed, you can safely
assume that the parallelism observed comes from threads

Option 3/
* Use a PC which does not depend on BLAS at all,
e.g. -pc_type jacobi -pc_type bjacobi
* If the performance differs (job runs slower) compared with your previous
run where you inferred parallelism was being employed, you can safely
assume that the parallelism observed comes from BLAS + threads

> Thanks!
>
> Ivan
> On 15/11/2018 18:24, Stefano Zampini wrote:
>
> If you say your program is parallel by just looking at the output from the
> top command, you are probably linking against a multithreaded blas library
>
> Il giorno Gio 15 Nov 2018, 20:09 Matthew Knepley via petsc-users <
> petsc-users at mcs.anl.gov> ha scritto:
>
>> On Thu, Nov 15, 2018 at 11:59 AM Ivan Voznyuk <
>> ivan.voznyuk.work at gmail.com> wrote:
>>
>>> Hi Matthew,
>>>
>>> Does it mean that by using just command python3 simple_code.py (without
>>> mpiexec) you *cannot* obtain a parallel execution?
>>>
>>
>> As I wrote before, its not impossible. You could be directly calling PMI,
>> but I do not think you are doing that.
>>
>>
>>> It s been 5 days we are trying to understand with my colleague how he
>>> managed to do so.
>>> It means that by using simply python3 simple_code.py he gets 8
>>> processors workiing.
>>> By the way, we wrote in his code few lines:
>>> rank = PETSc.COMM_WORLD.Get_rank()
>>> size = PETSc.COMM_WORLD.Get_size()
>>> and we got rank = 0, size = 1
>>>
>>
>> This is MPI telling you that you are only running on 1 processes.
>>
>>
>>> However, we compilator arrives to KSP.solve(), somehow it turns on 8
>>> processors.
>>>
>>
>> Why do you think its running on 8 processes?
>>
>>
>>> This problem is solved on his PC in 5-8 sec (in parallel, using *python3
>>> simple_code.py*), on mine it takes 70-90 secs (in sequantial, but with
>>> the same command *python3 simple_code.py*)
>>>
>>
>> I think its much more likely that there are differences in the solver
>> (use -ksp_view to see exactly what solver was used), then
>> to think it is parallelism. Moreover, you would never ever ever see that
>> much speedup on a laptop since all these computations
>> are bandwidth limited.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> So, conclusion is that on his computer this code works in the same way
>>> as scipy: all the code is executed in sequantial mode, but when it comes to
>>> solution of system of linear equations, it runs on all available
>>> processors. All this with just running python3 my_code.py (without any
>>> mpi-smth)
>>>
>>> Is it an exception / abnormal behavior? I mean, is it something
>>> irregular that you, developers, have never seen?
>>>
>>> Thanks and have a good evening!
>>> Ivan
>>>
>>> P.S. I don't think I know the answer regarding Scipy...
>>>
>>>
>>> On Thu, Nov 15, 2018 at 2:39 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Thu, Nov 15, 2018 at 8:07 AM Ivan Voznyuk <
>>>> ivan.voznyuk.work at gmail.com> wrote:
>>>>
>>>>> Hi Matthew,
>>>>> Thanks for your reply!
>>>>>
>>>>> Let me precise what I mean by defining few questions:
>>>>>
>>>>> 1. In order to obtain a parallel execution of simple_code.py, do I
>>>>> need to go with mpiexec python3 simple_code.py, or I can just launch
>>>>> python3 simple_code.py?
>>>>>
>>>>
>>>> mpiexec -n 2 python3 simple_code.py
>>>>
>>>>
>>>>> 2. This simple_code.py consists of 2 parts: a) preparation of matrix
>>>>> b) solving the system of linear equations with PETSc. If I launch mpirun
>>>>> (or mpiexec) -np 8 python3 simple_code.py, I suppose that I will basically
>>>>> obtain 8 matrices and 8 systems to solve. However, I need to prepare only
>>>>> one matrix, but launch this code in parallel on 8 processors.
>>>>>
>>>>
>>>> When you create the Mat object, you give it a communicator (here
>>>> PETSC_COMM_WORLD). That allows us to distribute the data. This is all
>>>> covered extensively in the manual and the online tutorials, as well as the
>>>> example code.
>>>>
>>>>
>>>>> In fact, here attached you will find a similar code (scipy_code.py)
>>>>> with only one difference: the system of linear equations is solved with
>>>>> scipy. So when I solve it, I can clearly see that the solution is obtained
>>>>> in a parallel way. However, I do not use the command mpirun (or mpiexec). I
>>>>> just go with python3 scipy_code.py.
>>>>>
>>>>
>>>> Why do you think its running in parallel?
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> In this case, the first part (creation of the sparse matrix) is not
>>>>> parallel, whereas the solution of system is found in a parallel way.
>>>>> So my question is, Do you think that it s possible to have the same
>>>>> behavior with PETSC? And what do I need for this?
>>>>>
>>>>> I am asking this because for my colleague it worked! It means that he
>>>>> launches the simple_code.py on his computer using the command python3
>>>>> simple_code.py (and not mpi-smth python3 simple_code.py) and he obtains a
>>>>> parallel execution of the same code.
>>>>>
>>>>> Thanks for your help!
>>>>> Ivan
>>>>>
>>>>>
>>>>> On Thu, Nov 15, 2018 at 11:54 AM Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Nov 15, 2018 at 4:53 AM Ivan Voznyuk via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Dear PETSC community,
>>>>>>>
>>>>>>> I have a question regarding the parallel execution of petsc4py.
>>>>>>>
>>>>>>> I have a simple code (here attached simple_code.py) which solves a
>>>>>>> system of linear equations Ax=b using petsc4py. To execute it, I use the
>>>>>>> command python3 simple_code.py which yields a sequential performance. With
>>>>>>> a colleague of my, we launched this code on his computer, and this time the
>>>>>>> execution was in parallel. Although, he used the same command python3
>>>>>>> simple_code.py (without mpirun, neither mpiexec).
>>>>>>>
>>>>>> I am not sure what you mean. To run MPI programs in parallel, you
>>>>>> need a launcher like mpiexec or mpirun. There are Python programs (like
>>>>>> nemesis) that use the launcher API directly (called PMI), but that is not
>>>>>> part of petsc4py.
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>> My configuration: Ubuntu x86_64 Ubuntu 16.04, Intel Core i7, PETSc
>>>>>>> 3.10.2, PETSC_ARCH=arch-linux2-c-debug, petsc4py 3.10.0 in virtualenv
>>>>>>>
>>>>>>> In order to parallelize it, I have already tried:
>>>>>>> - use 2 different PCs
>>>>>>> - use Ubuntu 16.04, 18.04
>>>>>>> - use different architectures (arch-linux2-c-debug,
>>>>>>> linux-gnu-c-debug, etc)
>>>>>>> - ofc use different configurations (my present config can be found
>>>>>>> in make.log that I attached here)
>>>>>>> - mpi from mpich, openmpi
>>>>>>>
>>>>>>> Nothing worked.
>>>>>>>
>>>>>>> Do you have any ideas?
>>>>>>>
>>>>>>> Thanks and have a good day,
>>>>>>> Ivan
>>>>>>>
>>>>>>> --
>>>>>>> Ivan VOZNYUK
>>>>>>> PhD in Computational Electromagnetics
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ivan VOZNYUK
>>>>> PhD in Computational Electromagnetics
>>>>> +33 (0)6.95.87.04.55
>>>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>>
>>> --
>>> Ivan VOZNYUK
>>> PhD in Computational Electromagnetics
>>> +33 (0)6.95.87.04.55
>>> My webpage <https://ivanvoznyukwork.wixsite.com/webpage>
>>> My LinkedIn <http://linkedin.com/in/ivan-voznyuk-b869b8106>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181116/7821f58c/attachment-0001.html>