[petsc-users] MPI speedup

Thu Apr 23 12:42:33 CDT 2015

  Please see http://www.mcs.anl.gov/petsc/documentation/faq.html#computers and note the information about "binding" options for MPICH and OpenMPI that can sometimes improve the streams performance (and hence other algorithms performance) a good amount. 

  Barry

> On Apr 23, 2015, at 6:02 AM, siddhesh godbole <siddhesh4godbole at gmail.com> wrote:
> 
> Matt
> 
> So that means the time on 10 processes in merely 1.8 times the time on 1 process?? this is quite difficult to digest!  Okay so if memory bandwidth  a controlling factor here how will forming a cluster with same machines  solve this problem?
> my cpu has max memory bandwidth of 59 GB/s . 
> 
> 
> Apologies if the question are too silly!
> 
> Siddhesh M Godbole
> 
> 5th year Dual Degree,
> Civil Eng & Applied Mech.
> IIT Madras
> 
> On Thu, Apr 23, 2015 at 4:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Thu, Apr 23, 2015 at 5:47 AM, siddhesh godbole <siddhesh4godbole at gmail.com> wrote:
> Hello,
> 
> I want to know about the test which is conducted just after the PETSC is configured on the system to assess the possible speedup by MPI processes. I have saved the result file which says:
> Number of MPI processes 10
> Process 0 iitm
> Process 1 iitm
> Process 2 iitm
> Process 3 iitm
> Process 4 iitm
> Process 5 iitm
> Process 6 iitm
> Process 7 iitm
> Process 8 iitm
> Process 9 iitm
> Function      Rate (MB/s) 
> Copy:       24186.8271
> Scale:      23914.0401
> Add:        27271.7149
> Triad:      27787.1630
> ------------------------------------------------
> np  speedup
> 1 1.0
> 2 1.75
> 3 1.86
> 4 1.84
> 5 1.85
> 6 1.83
> 7 1.76
> 8 1.79
> 9 1.8
> 10 1.8
> Estimation of possible speedup of MPI programs based on Streams benchmark.
> 
> 1) What parameters the speedup depends on?
> 
> I am not sure what you are asking here. Speedup is defined as the time T on 1 process divided
> by the time T_p on p processes:
> 
>   S = T/T_p
>  
> 2) what are the hardware requirements for higher speedup? ( i was expecting atleast 5 times speedup after generating 10 processes.
> 
> STREAMS measures the speedup of vectors operations, which are very similar to sparse matrix operations. Both
> are limited by memory bandwidth.
>  
> 3) what could possibly be done to improve this ?
> 
> 1) You could buy more nodes, since each node has a path to memory
> 
> 2) You could change algorithms, but this has proven very difficult
> 
>    Thanks,
> 
>       Matt
>  
> i have intel® Core™ i7-4930K CPU @ 3.40GHz × 12 with 32 GB of RAM and 1 TB disk space.
> 
> 
> Thanks
> Siddhesh M Godbole
> 
> 5th year Dual Degree,
> Civil Eng & Applied Mech.
> IIT Madras
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>