[petsc-users] Parallel processes run significantly slower
Junchao Zhang
junchao.zhang at gmail.com
Fri Jan 12 13:35:56 CST 2024
Hi, Steffen,
It is probably because your laptop CPU is "weak". I have a local machine
with one Intel Core i7 processor, which has 8 cores (16 hardware threads).
I got a similar STREAM speedup. It just means 1~2 MPI ranks can use up all
the memory bandwidth. That is why with your (weak scaling) test, more MPI
ranks just gave longer time. Because the MPI processes had to share the
memory bandwidth.
On another machine with two AMD EPYC 7452 32-Core processors, there are
8 NUMA domains. I got
$ mpirun -n 1 --bind-to numa --map-by numa ./MPIVersion
1 22594.4873 Rate (MB/s)
$ mpirun -n 8 --bind-to numa --map-by numa ./MPIVersion
8 173565.3584 Rate (MB/s) 7.68175
On this kind of machine, you can expect constant time of your test up to
8 MPI ranks.
--Junchao Zhang
On Fri, Jan 12, 2024 at 11:13 AM Steffen Wilksen | Universitaet Bremen <
swilksen at itp.uni-bremen.de> wrote:
> Hi Junchao,
>
> I tried it out, but unfortunately, this does not seem to give any
> imporvements, the code is still much slower when starting more processes.
>
>
> ----- Message from Junchao Zhang <junchao.zhang at gmail.com> ---------
> Date: Fri, 12 Jan 2024 09:41:39 -0600
> From: Junchao Zhang <junchao.zhang at gmail.com>
> Subject: Re: [petsc-users] Parallel processes run significantly slower
> To: Steffen Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de
> >
> Cc: Barry Smith <bsmith at petsc.dev>, PETSc users list <
> petsc-users at mcs.anl.gov>
>
> Hi, Steffen,
> Would it be an MPI process binding issue? Could you try running with
>
> mpiexec --bind-to core -n N python parallel_example.py
>
>
> --Junchao Zhang
>
> On Fri, Jan 12, 2024 at 8:52 AM Steffen Wilksen | Universitaet Bremen <
> swilksen at itp.uni-bremen.de> wrote:
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Thank you for your feedback. @Stefano: the use of my communicator was
>> intentional, since I later intend to distribute M independent calculations
>> to N processes, each process then only needing to do M/N calculations. Of
>> course I don't expect speed up in my example since the number of
>> calculations is constant and not dependent on N, but I would hope that the
>> time each process takes does not increase too drastically with N. @Barry: I
>> tried to do the STREAMS benchmark, these are my results: 1 23467.9961
>> Rate (MB/s) 1 2 26852.0536 Rate (MB/s) 1.1442 3 29715.4762 Rate
>> (MB/s) 1.26621 4 34132.2490 Rate (MB/s) 1.45442 5 34924.3020 Rate
>> (MB/s) 1.48817 6 34315.5290 Rate (MB/s) 1.46223 7 33134.9545 Rate
>> (MB/s) 1.41192 8 33234.9141 Rate (MB/s) 1.41618 9 32584.3349 Rate
>> (MB/s) 1.38846 10 32582.3962 Rate (MB/s) 1.38838 11 32098.2903 Rate
>> (MB/s) 1.36775 12 32064.8779 Rate (MB/s) 1.36632 13 31692.0541 Rate
>> (MB/s) 1.35044 14 31274.2421 Rate (MB/s) 1.33263 15 31574.0196 Rate
>> (MB/s) 1.34541 16 30906.7773 Rate (MB/s) 1.31698 I also attached the
>> resulting plot. As it seems, I get very bad MPI speedup (red curve,
>> right?), even decreasing if I use too many threads. I don't fully
>> understand the reasons given in the discussion you linked since this is all
>> very new to me, but I take that this is a problem with my computer which I
>> can't easily fix, right? ----- Message from Barry Smith <bsmith at petsc.dev
>> <bsmith at petsc.dev>> --------- Date: Thu, 11 Jan 2024 11:56:24 -0500
>> From: Barry Smith <bsmith at petsc.dev <bsmith at petsc.dev>> Subject: Re:
>> [petsc-users] Parallel processes run significantly slower To: Steffen
>> Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de
>> <swilksen at itp.uni-bremen.de>> Cc: PETSc users list
>> <petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>>*
>>
>>
>> * Take a look at the discussion
>> in https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html
>> <https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html> and
>> I suggest you run the streams benchmark from the
>> branch barry/2023-09-15/fix-log-pcmpi on your machine to get a baseline for
>> what kind of speedup you can expect. *
>>
>> * Then let us know your thoughts.*
>>
>> * Barry*
>>
>>
>>
>> *On Jan 11, 2024, at 11:37 AM, Stefano Zampini <stefano.zampini at gmail.com
>> <stefano.zampini at gmail.com>> wrote:*
>>
>> *You are creating the matrix on the wrong communicator if you want it
>> parallel. You are using PETSc.COMM_SELF*
>>
>> *On Thu, Jan 11, 2024, 19:28 Steffen Wilksen | Universitaet Bremen
>> <swilksen at itp.uni-bremen.de <swilksen at itp.uni-bremen.de>> wrote:*
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Hi all, I'm trying to do repeated matrix-vector-multiplication of large
>>> sparse matrices in python using petsc4py. Even the most simple method of
>>> parallelization, dividing up the calculation to run on multiple processes
>>> indenpendtly, does not seem to give a singnificant speed up for large
>>> matrices. I constructed a minimal working example, which I run using
>>> mpiexec -n N python parallel_example.py, where N is the number of
>>> processes. Instead of taking approximately the same time irrespective of
>>> the number of processes used, the calculation is much slower when starting
>>> more MPI processes. This translates to little to no speed up when splitting
>>> up a fixed number of calculations over N processes. As an example, running
>>> with N=1 takes 9s, while running with N=4 takes 34s. When running with
>>> smaller matrices, the problem is not as severe (only slower by a factor of
>>> 1.5 when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the same
>>> problems when just starting the script four times manually without using
>>> MPI. I attached both the script and the log file for running the script
>>> with N=4. Any help would be greatly appreciated. Calculations are done on
>>> my laptop, arch linux version 6.6.8 and PETSc version 3.20.2. Kind Regards
>>> Steffen*
>>>
>>
>>
>>
>> *----- End message from Barry Smith <bsmith at petsc.dev <bsmith at petsc.dev>>
>> -----*
>>
>>
>>
>
>
>
> *----- End message from Junchao Zhang <junchao.zhang at gmail.com
> <junchao.zhang at gmail.com>> -----*
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240112/e684cccc/attachment.html>
More information about the petsc-users
mailing list