[petsc-users] Parallel processes run significantly slower
Steffen Wilksen | Universitaet Bremen
swilksen at itp.uni-bremen.de
Fri Jan 12 11:13:35 CST 2024
Hi Junchao,
I tried it out, but unfortunately, this does not seem to give any
imporvements, the code is still much slower when starting more
processes.
----- Message from Junchao Zhang <junchao.zhang at gmail.com> ---------
Date: Fri, 12 Jan 2024 09:41:39 -0600
From: Junchao Zhang <junchao.zhang at gmail.com>
Subject: Re: [petsc-users] Parallel processes run significantly slower
To: Steffen Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de>
Cc: Barry Smith <bsmith at petsc.dev>, PETSc users list
<petsc-users at mcs.anl.gov>
> Hi, Steffen, Would it be an MPI process binding
> issue? Could you try running with
>
>> mpiexec --bind-to core -n N python parallel_example.py
>
>
> --Junchao Zhang
>
> On Fri, Jan 12, 2024 at 8:52 AM Steffen Wilksen | Universitaet
> Bremen <swilksen at itp.uni-bremen.de> wrote:
>
>> _Thank you for your feedback.
>> @Stefano: the use of my communicator was intentional, since I later
>> intend to distribute M independent calculations to N processes,
>> each process then only needing to do M/N calculations. Of course I
>> don't expect speed up in my example since the number of
>> calculations is constant and not dependent on N, but I would hope
>> that the time each process takes does not increase too drastically
>> with N.
>> @Barry: I tried to do the STREAMS benchmark, these are my results:
>> 1 23467.9961 Rate (MB/s) 1
>> 2 26852.0536 Rate (MB/s) 1.1442
>> 3 29715.4762 Rate (MB/s) 1.26621
>> 4 34132.2490 Rate (MB/s) 1.45442
>> 5 34924.3020 Rate (MB/s) 1.48817
>> 6 34315.5290 Rate (MB/s) 1.46223
>> 7 33134.9545 Rate (MB/s) 1.41192
>> 8 33234.9141 Rate (MB/s) 1.41618
>> 9 32584.3349 Rate (MB/s) 1.38846
>> 10 32582.3962 Rate (MB/s) 1.38838
>> 11 32098.2903 Rate (MB/s) 1.36775
>> 12 32064.8779 Rate (MB/s) 1.36632
>> 13 31692.0541 Rate (MB/s) 1.35044
>> 14 31274.2421 Rate (MB/s) 1.33263
>> 15 31574.0196 Rate (MB/s) 1.34541
>> 16 30906.7773 Rate (MB/s) 1.31698
>>
>> I also attached the resulting plot. As it seems, I get very bad MPI
>> speedup (red curve, right?), even decreasing if I use too many
>> threads. I don't fully understand the reasons given in the
>> discussion you linked since this is all very new to me, but I take
>> that this is a problem with my computer which I can't easily fix,
>> right?
>>
>> ----- Message from Barry Smith <bsmith at petsc.dev> ---------
>> Date: Thu, 11 Jan 2024 11:56:24 -0500
>> From: Barry Smith <bsmith at petsc.dev>
>> Subject: Re: [petsc-users] Parallel processes run significantly slower
>> To: Steffen Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de>
>> Cc: PETSc users list <petsc-users at mcs.anl.gov>_
>>
>>> _ _
>>> _ Take a look at the discussion
>>> in https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html and I suggest you run the streams benchmark from the branch barry/2023-09-15/fix-log-pcmpi on your machine to get a baseline for what kind of speedup you can expect. _
>>> _ _
>>> _ Then let us know your thoughts._
>>> _ _
>>> _ Barry_
>>>
>>>
>>>
>>>
>>>> _On Jan 11, 2024, at 11:37 AM, Stefano Zampini
>>>> <stefano.zampini at gmail.com> wrote:_
>>>>
>>>> _You are creating the matrix on the wrong
>>>> communicator if you want it parallel. You are using
>>>> PETSc.COMM_SELF_
>>>>
>>>> _On Thu, Jan 11, 2024, 19:28 Steffen
>>>> Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de> wrote:_
>>>>
>>>>> __Hi all,
>>>>>
>>>>> I'm trying to do repeated matrix-vector-multiplication of large
>>>>> sparse matrices in python using petsc4py. Even the most simple
>>>>> method of parallelization, dividing up the calculation to run on
>>>>> multiple processes indenpendtly, does not seem to give a
>>>>> singnificant speed up for large matrices. I constructed a
>>>>> minimal working example, which I run using
>>>>>
>>>>> mpiexec -n N python parallel_example.py,
>>>>>
>>>>> where N is the number of processes. Instead of taking
>>>>> approximately the same time irrespective of the number of
>>>>> processes used, the calculation is much slower when starting
>>>>> more MPI processes. This translates to little to no speed up
>>>>> when splitting up a fixed number of calculations over N
>>>>> processes. As an example, running with N=1 takes 9s, while
>>>>> running with N=4 takes 34s. When running with smaller matrices,
>>>>> the problem is not as severe (only slower by a factor of 1.5
>>>>> when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the
>>>>> same problems when just starting the script four times manually
>>>>> without using MPI.
>>>>> I attached both the script and the log file for running the
>>>>> script with N=4. Any help would be greatly appreciated.
>>>>> Calculations are done on my laptop, arch linux version 6.6.8 and
>>>>> PETSc version 3.20.2.
>>>>>
>>>>> Kind Regards
>>>>> Steffen__
>>
>> __----- End message from Barry Smith <bsmith at petsc.dev> -----__
>>
>>
_----- End message from Junchao Zhang <junchao.zhang at gmail.com> -----_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240112/08750b17/attachment.html>
More information about the petsc-users
mailing list