[petsc-users] MPI speedup

Thu Apr 23 06:02:09 CDT 2015

Matt

So that means the time on 10 processes in merely 1.8 times the time on 1
process?? this is quite difficult to digest!  Okay so if memory bandwidth
 a controlling factor here how will forming a cluster with same machines
 solve this problem?
my cpu has max memory bandwidth of 59 GB/s .

Apologies if the question are too silly!

*Siddhesh M Godbole*

5th year Dual Degree,
Civil Eng & Applied Mech.
IIT Madras

On Thu, Apr 23, 2015 at 4:22 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Apr 23, 2015 at 5:47 AM, siddhesh godbole <
> siddhesh4godbole at gmail.com> wrote:
>
>> Hello,
>>
>> I want to know about the test which is conducted just after the PETSC is
>> configured on the system to assess the possible speedup by MPI processes. I
>> have saved the result file which says:
>> *Number of MPI processes 10*
>> *Process 0 iitm*
>> *Process 1 iitm*
>> *Process 2 iitm*
>> *Process 3 iitm*
>> *Process 4 iitm*
>> *Process 5 iitm*
>> *Process 6 iitm*
>> *Process 7 iitm*
>> *Process 8 iitm*
>> *Process 9 iitm*
>> *Function      Rate (MB/s) *
>> *Copy:       24186.8271*
>> *Scale:      23914.0401*
>> *Add:        27271.7149*
>> *Triad:      27787.1630*
>> *------------------------------------------------*
>> *np  speedup*
>> *1 1.0*
>> *2 1.75*
>> *3 1.86*
>> *4 1.84*
>> *5 1.85*
>> *6 1.83*
>> *7 1.76*
>> *8 1.79*
>> *9 1.8*
>> *10 1.8*
>> *Estimation of possible speedup of MPI programs based on Streams
>> benchmark.*
>>
>> 1) What parameters the speedup depends on?
>>
>
> I am not sure what you are asking here. Speedup is defined as the time T
> on 1 process divided
> by the time T_p on p processes:
>
>   S = T/T_p
>
>
>> 2) what are the hardware requirements for higher speedup? ( i was
>> expecting atleast 5 times speedup after generating 10 processes.
>>
>
> STREAMS measures the speedup of vectors operations, which are very similar
> to sparse matrix operations. Both
> are limited by memory bandwidth.
>
>
>> 3) what could possibly be done to improve this ?
>>
>
> 1) You could buy more nodes, since each node has a path to memory
>
> 2) You could change algorithms, but this has proven very difficult
>
>    Thanks,
>
>       Matt
>
>
>> i have intel® Core™ i7-4930K CPU @ 3.40GHz × 12 with 32 GB of RAM and 1
>> TB disk space.
>>
>>
>> Thanks
>> *Siddhesh M Godbole*
>>
>> 5th year Dual Degree,
>> Civil Eng & Applied Mech.
>> IIT Madras
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150423/c12b9de2/attachment.html>