[petsc-users] Parallel processes run significantly slower

Fri Jan 12 11:13:35 CST 2024

  Hi Junchao,

I tried it out, but unfortunately, this does not seem to give any  
imporvements, the code is still much slower when starting more  
processes.

----- Message from Junchao Zhang <junchao.zhang at gmail.com> ---------
   Date: Fri, 12 Jan 2024 09:41:39 -0600
   From: Junchao Zhang <junchao.zhang at gmail.com>
Subject: Re: [petsc-users] Parallel processes run significantly slower
     To: Steffen Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de>
     Cc: Barry Smith <bsmith at petsc.dev>, PETSc users list  
<petsc-users at mcs.anl.gov>

> Hi,  Steffen,        Would it be an MPI process binding  
> issue?  Could you try running with 
>
>> mpiexec --bind-to core -n N python parallel_example.py
>
>
>                --Junchao Zhang
>
>      On Fri, Jan 12, 2024 at 8:52 AM Steffen Wilksen | Universitaet  
> Bremen <swilksen at itp.uni-bremen.de> wrote:
>
>> _Thank you for your feedback.
>> @Stefano: the use of my communicator was intentional, since I later  
>> intend to distribute M independent calculations to N processes,  
>> each process then only needing to do M/N calculations. Of course I  
>> don't expect speed up in my example since the number of  
>> calculations is constant and not dependent on N, but I would hope  
>> that the time each process takes does not increase too drastically  
>> with N.
>> @Barry: I tried to do the STREAMS benchmark, these are my results:
>> 1  23467.9961   Rate (MB/s) 1
>> 2  26852.0536   Rate (MB/s) 1.1442
>> 3  29715.4762   Rate (MB/s) 1.26621
>> 4  34132.2490   Rate (MB/s) 1.45442
>> 5  34924.3020   Rate (MB/s) 1.48817
>> 6  34315.5290   Rate (MB/s) 1.46223
>> 7  33134.9545   Rate (MB/s) 1.41192
>> 8  33234.9141   Rate (MB/s) 1.41618
>> 9  32584.3349   Rate (MB/s) 1.38846
>> 10  32582.3962   Rate (MB/s) 1.38838
>> 11  32098.2903   Rate (MB/s) 1.36775
>> 12  32064.8779   Rate (MB/s) 1.36632
>> 13  31692.0541   Rate (MB/s) 1.35044
>> 14  31274.2421   Rate (MB/s) 1.33263
>> 15  31574.0196   Rate (MB/s) 1.34541
>> 16  30906.7773   Rate (MB/s) 1.31698
>>
>> I also attached the resulting plot. As it seems, I get very bad MPI  
>> speedup (red curve, right?), even decreasing if I use too many  
>> threads. I don't fully understand the reasons given in the  
>> discussion you linked since this is all very new to me, but I take  
>> that this is a problem with my computer which I can't easily fix,  
>> right?
>>
>> ----- Message from Barry Smith <bsmith at petsc.dev> ---------
>>    Date: Thu, 11 Jan 2024 11:56:24 -0500
>>    From: Barry Smith <bsmith at petsc.dev>
>> Subject: Re: [petsc-users] Parallel processes run significantly slower
>>      To: Steffen Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de>
>>      Cc: PETSc users list <petsc-users at mcs.anl.gov>_
>>
>>> _ _
>>> _   Take a look at the discussion  
>>> in https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html and I suggest you run the streams benchmark from the branch barry/2023-09-15/fix-log-pcmpi on your machine to get a baseline for what kind of speedup you can expect.   _            
>>> _ _
>>>       _    Then let us know your thoughts._
>>>       _ _
>>>       _   Barry_
>>>
>>>
>>>
>>>
>>>> _On Jan 11, 2024, at 11:37 AM, Stefano Zampini  
>>>> <stefano.zampini at gmail.com> wrote:_
>>>>
>>>>                      _You are creating the matrix on the wrong  
>>>> communicator if you want it parallel. You are using  
>>>> PETSc.COMM_SELF_
>>>>
>>>>                        _On Thu, Jan 11, 2024, 19:28 Steffen  
>>>> Wilksen | Universitaet Bremen <swilksen at itp.uni-bremen.de> wrote:_
>>>>
>>>>> __Hi all,
>>>>>
>>>>> I'm trying to do repeated matrix-vector-multiplication of large  
>>>>> sparse matrices in python using petsc4py. Even the most simple  
>>>>> method of parallelization, dividing up the calculation to run on  
>>>>> multiple processes indenpendtly, does not seem to give a  
>>>>> singnificant speed up for large matrices. I constructed a  
>>>>> minimal working example, which I run using
>>>>>
>>>>> mpiexec -n N python parallel_example.py,
>>>>>
>>>>> where N is the number of processes. Instead of taking  
>>>>> approximately the same time irrespective of the number of  
>>>>> processes used, the calculation is much slower when starting  
>>>>> more MPI processes. This translates to little to no speed up  
>>>>> when splitting up a fixed number of calculations over N  
>>>>> processes. As an example, running with N=1 takes 9s, while  
>>>>> running with N=4 takes 34s. When running with smaller matrices,  
>>>>> the problem is not as severe (only slower by a factor of 1.5  
>>>>> when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the  
>>>>> same problems when just starting the script four times manually  
>>>>> without using MPI.
>>>>> I attached both the script and the log file for running the  
>>>>> script with N=4. Any help would be greatly appreciated.  
>>>>> Calculations are done on my laptop, arch linux version 6.6.8 and  
>>>>> PETSc version 3.20.2.
>>>>>
>>>>> Kind Regards
>>>>> Steffen__
>>
>> __----- End message from Barry Smith <bsmith at petsc.dev> -----__
>>
>>  

_----- End message from Junchao Zhang <junchao.zhang at gmail.com> -----_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240112/08750b17/attachment.html>