<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Hi,</div><div>You were totally right: no miracle, parallelization does come from multithreading. We checked Option 1/: played with OMP_NUM_THREADS=1 it changed computational time.</div><div><br></div><div>So, I reinstalled everything (starting with Ubuntu ending with petsc) and configured the following things:</div><div><br></div><div>- installed system's ompenmpi</div><div>- installed Intel MKL Blas / Lapack</div><div>- configured PETSC as ./configure --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64 --download-scalapack --download-mumps --with-hwloc --with-shared --with-openmp=1 --with-pthread=1 --with-scalar-type=complex</div><div>hoping that it would take into account blas multithreading</div><div>- installed petsc4py</div><div><br></div><div>However, I do not get any parallelization...</div><div>What I tried to do so far unsuccessfully :</div><div>- play with OMP_NUM_THREADS</div><div>- reinstall the system</div><div>- ldd <a href="http://PETSc.cpython-35m-x86_64-linux-gnu.so">PETSc.cpython-35m-x86_64-linux-gnu.so</a> yields lld_result.txt (here attached)</div><div>I noted that libmkl_sequential.so library there. Do you think this is normal?</div><div>- I found a similar problem reported here: <a href="https://lists.mcs.anl.gov/pipermail/petsc-users/2016-March/028803.html">https://lists.mcs.anl.gov/pipermail/petsc-users/2016-March/028803.html</a> To solve this problem, developers recommended to replace -lmkl_sequential to -lmkl_intel_thread options in PETSC_ARCH/lib/conf/petscvariables. However, I did not find something that would be named like this (it might be a change of version)</div><div>- Anyway, I replaced lmkl_sequential to lmkl_intel_thread in every file of PETSC, but it changed nothing.<br></div><div><br></div><div>As a result, in the new make.log (here attached ) I have a parameter #define PETSC_HAVE_LIBMKL_SEQUENTIAL 1 and option -lmkl_sequential<br></div><div><br></div><div>Do you have any idea of what I should change in the initial options in order to obtain the blas multithreding parallelization?</div><div><br></div><div>Thanks a lot for your help!</div><div><br></div><div>Ivan<br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div></div></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Nov 16, 2018 at 1:25 AM Dave May <<a href="mailto:dave.mayhem23@gmail.com">dave.mayhem23@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, 15 Nov 2018 at 17:44, Ivan via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi Stefano,</p>
<p>In fact, yes, we look at the htop output (and the resulting
computational time ofc).</p>
<p>In our code we use MUMPS, which indeed depends on blas / lapack.
So I think this might be it!</p>
<p>I will definetely check it (I mean the difference between our
MUMPS, blas, lapack).</p>
<p>If you have an idea of how we can verify on his PC that the
source of his parallelization does come from BLAS, please do not
hesitate to tell me!</p></div></blockquote><div><br></div><div>Option 1/</div><div>* Set this environment variable</div><div> export OMP_NUM_THREADS=1</div><div>* Re-run your "parallel" test.</div><div>* If the performance differs (job runs slower) compared with your previous run where you inferred parallelism was being employed, you can safely assume that the parallelism observed comes from threads</div><div><br></div><div>Option 2/</div><div>* Re-configure PETSc to use a known BLAS implementation which does not support threads</div><div>* Re-compile PETSc</div><div>* Re-run your parallel test</div><div><div>* If the performance differs (job runs slower) compared with your previous run where you inferred parallelism was being employed, you can safely assume that the parallelism observed comes from threads</div></div><div><br></div><div>Option 3/</div><div>* Use a PC which does not depend on BLAS at all, </div><div>e.g. -pc_type jacobi -pc_type bjacobi</div><div><div>* If the performance differs (job runs slower) compared with your previous run where you inferred parallelism was being employed, you can safely assume that the parallelism observed comes from BLAS + threads</div></div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<p>Thanks!</p>
<p>Ivan<br>
</p>
<div class="m_6813965073151497321gmail-m_6112983868766395610moz-cite-prefix">On 15/11/2018 18:24, Stefano Zampini
wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">If you say your program is parallel by just
looking at the output from the top command, you are probably
linking against a multithreaded blas library</div>
<br>
<div class="gmail_quote">
<div dir="ltr">Il giorno Gio 15 Nov 2018, 20:09 Matthew Knepley
via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> ha
scritto:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Thu, Nov 15, 2018 at 11:59 AM Ivan
Voznyuk <<a href="mailto:ivan.voznyuk.work@gmail.com" rel="noreferrer" target="_blank">ivan.voznyuk.work@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">Hi
Matthew,</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im"><br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">Does
it mean that by using just command python3
simple_code.py (without mpiexec) you <u>cannot</u>
obtain a parallel execution? <br>
</span></div>
</div>
</blockquote>
<div><br>
</div>
<div>As I wrote before, its not impossible. You could be
directly calling PMI, but I do not think you are doing
that.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">It
s been 5 days we are trying to understand with my
colleague how he managed to do so.</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">It
means that by using simply python3 simple_code.py
he gets 8 processors workiing.</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">By
the way, we wrote in his code few lines:</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">rank
= PETSc.COMM_WORLD.Get_rank()<br>
size = PETSc.COMM_WORLD.Get_size() <br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">and
we got rank = 0, size = 1</span></div>
</div>
</blockquote>
<div><br>
</div>
<div>This is MPI telling you that you are only running on
1 processes.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">However,
we compilator arrives to KSP.solve(), somehow it
turns on 8 processors.</span></div>
</div>
</blockquote>
<div><br>
</div>
<div>Why do you think its running on 8 processes?</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">This
problem is solved on his PC in 5-8 sec (in
parallel, using <u>python3 simple_code.py</u>),
on mine it takes 70-90 secs (in sequantial, but
with the same command <u><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">python3
simple_code.py</span></u>)</span></div>
</div>
</blockquote>
<div><br>
</div>
<div>I think its much more likely that there are
differences in the solver (use -ksp_view to see exactly
what solver was used), then</div>
<div>to think it is parallelism. Moreover, you would never
ever ever see that much speedup on a laptop since all
these computations</div>
<div>are bandwidth limited.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">So,
conclusion is that on his computer this code works
in the same way as scipy: all the code is executed
in sequantial mode, but when it comes to solution
of system of linear equations, it runs on all
available processors. All this with just running
python3 my_code.py (without any mpi-smth)<br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im"><br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">Is
it an exception / abnormal behavior? I mean, is it
something irregular that you, developers, have
never seen?</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im"><br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">Thanks
and have a good evening!</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">Ivan</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im"><br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im">P.S.
I don't think I know the answer regarding Scipy...<br>
</span></div>
<div><span class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail-im"><br>
</span></div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Thu, Nov 15, 2018 at 2:39 PM Matthew
Knepley <<a href="mailto:knepley@gmail.com" rel="noreferrer" target="_blank">knepley@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Thu, Nov 15, 2018 at 8:07 AM
Ivan Voznyuk <<a href="mailto:ivan.voznyuk.work@gmail.com" rel="noreferrer" target="_blank">ivan.voznyuk.work@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi Matthew,</div>
<div>Thanks for your reply!</div>
<div><br>
</div>
<div>Let me precise what I mean by defining
few questions:</div>
<div><br>
</div>
<div>1. In order to obtain a parallel
execution of simple_code.py, do I need to
go with mpiexec python3 simple_code.py, or
I can just launch python3 simple_code.py?</div>
</div>
</blockquote>
<div><br>
</div>
<div>mpiexec -n 2 python3 simple_code.py</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>2. This simple_code.py consists of 2
parts: a) preparation of matrix b) solving
the system of linear equations with PETSc.
If I launch mpirun (or mpiexec) -np 8
python3 simple_code.py, I suppose that I
will basically obtain 8 matrices and 8
systems to solve. However, I need to
prepare only one matrix, but launch this
code in parallel on 8 processors.</div>
</div>
</blockquote>
<div><br>
</div>
<div>When you create the Mat object, you give it
a communicator (here PETSC_COMM_WORLD). That
allows us to distribute the data. This is all
covered extensively in the manual and the
online tutorials, as well as the example code.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>In fact, here attached you will find a
similar code (scipy_code.py) with only one
difference: the system of linear equations
is solved with scipy. So when I solve it,
I can clearly see that the solution is
obtained in a parallel way. However, I do
not use the command mpirun (or mpiexec). I
just go with python3 scipy_code.py.</div>
</div>
</blockquote>
<div><br>
</div>
<div>Why do you think its running in parallel?</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>In this case, the first part (creation
of the sparse matrix) is not parallel,
whereas the solution of system is found in
a parallel way.</div>
<div>So my question is, Do you think that it
s possible to have the same behavior with
PETSC? And what do I need for this?<br>
</div>
<div><br>
</div>
<div>I am asking this because for my
colleague it worked! It means that he
launches the simple_code.py on his
computer using the command python3
simple_code.py (and not mpi-smth python3
simple_code.py) and he obtains a parallel
execution of the same code.<br>
</div>
<div><br>
</div>
<div>Thanks for your help!</div>
<div>Ivan<br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Thu, Nov 15, 2018 at 11:54
AM Matthew Knepley <<a href="mailto:knepley@gmail.com" rel="noreferrer" target="_blank">knepley@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Thu, Nov 15, 2018 at
4:53 AM Ivan Voznyuk via petsc-users
<<a href="mailto:petsc-users@mcs.anl.gov" rel="noreferrer" target="_blank">petsc-users@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<p>Dear PETSC community,</p>
<p>I have a question regarding the
parallel execution of petsc4py.</p>
<p>I have a simple code (here
attached simple_code.py) which
solves a system of linear
equations Ax=b using petsc4py.
To execute it, I use the command
python3 simple_code.py which
yields a sequential performance.
With a colleague of my, we
launched this code on his
computer, and this time the
execution was in parallel.
Although, he used the same
command python3 simple_code.py
(without mpirun, neither
mpiexec).</p>
</div>
</blockquote>
<div>I am not sure what you mean. To
run MPI programs in parallel, you
need a launcher like mpiexec or
mpirun. There are Python programs
(like nemesis) that use the launcher
API directly (called PMI), but that
is not part of petsc4py.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<p>My configuration: Ubuntu x86_64
Ubuntu 16.04, Intel Core i7,
PETSc 3.10.2,
PETSC_ARCH=arch-linux2-c-debug,
petsc4py 3.10.0 in virtualenv <br>
</p>
<p>In order to parallelize it, I
have already tried:<br>
- use 2 different PCs<br>
- use Ubuntu 16.04, 18.04<br>
- use different architectures
(arch-linux2-c-debug,
linux-gnu-c-debug, etc)<br>
- ofc use different
configurations (my present
config can be found in make.log
that I attached here)<br>
- mpi from mpich, openmpi</p>
<p>Nothing worked.</p>
<p>Do you have any ideas?</p>
<p>Thanks and have a good day,<br>
Ivan</p>
<br>
-- <br>
<div dir="ltr" class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740m_-7563762080626270029m_1851220933935391964m_9043555073033899979m_4831720893541188530gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">Ivan VOZNYUK
<div>PhD in Computational
Electromagnetics</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740m_-7563762080626270029m_1851220933935391964m_9043555073033899979gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters
take for granted before
they begin their
experiments is infinitely
more interesting than any
results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740m_-7563762080626270029m_1851220933935391964gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">Ivan VOZNYUK
<div>PhD in Computational
Electromagnetics</div>
<div>+33 (0)6.95.87.04.55</div>
<div><a href="https://ivanvoznyukwork.wixsite.com/webpage" rel="noreferrer" target="_blank">My webpage</a><br>
</div>
<div><a href="http://linkedin.com/in/ivan-voznyuk-b869b8106" rel="noreferrer" target="_blank">My LinkedIn</a></div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740m_-7563762080626270029gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for
granted before they begin their
experiments is infinitely more
interesting than any results to
which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285m_-4222405733861688740gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">Ivan VOZNYUK
<div>PhD in Computational Electromagnetics</div>
<div>+33 (0)6.95.87.04.55</div>
<div><a href="https://ivanvoznyukwork.wixsite.com/webpage" rel="noreferrer" target="_blank">My webpage</a><br>
</div>
<div><a href="http://linkedin.com/in/ivan-voznyuk-b869b8106" rel="noreferrer" target="_blank">My LinkedIn</a></div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="m_6813965073151497321gmail-m_6112983868766395610m_1400684703581952285gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted
before they begin their experiments is
infinitely more interesting than any results
to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div></div></div></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">Ivan VOZNYUK<div>PhD in Computational Electromagnetics</div><div>+33 (0)6.95.87.04.55</div><div><a href="https://ivanvoznyukwork.wixsite.com/webpage" target="_blank">My webpage</a><br></div><div><a href="http://linkedin.com/in/ivan-voznyuk-b869b8106" target="_blank">My LinkedIn</a></div></div></div></div></div>