[petsc-users] MPI speedup
Barry Smith
bsmith at mcs.anl.gov
Thu Apr 23 12:42:33 CDT 2015
Please see http://www.mcs.anl.gov/petsc/documentation/faq.html#computers and note the information about "binding" options for MPICH and OpenMPI that can sometimes improve the streams performance (and hence other algorithms performance) a good amount.
Barry
> On Apr 23, 2015, at 6:02 AM, siddhesh godbole <siddhesh4godbole at gmail.com> wrote:
>
> Matt
>
> So that means the time on 10 processes in merely 1.8 times the time on 1 process?? this is quite difficult to digest! Okay so if memory bandwidth a controlling factor here how will forming a cluster with same machines solve this problem?
> my cpu has max memory bandwidth of 59 GB/s .
>
>
> Apologies if the question are too silly!
>
> Siddhesh M Godbole
>
> 5th year Dual Degree,
> Civil Eng & Applied Mech.
> IIT Madras
>
> On Thu, Apr 23, 2015 at 4:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Thu, Apr 23, 2015 at 5:47 AM, siddhesh godbole <siddhesh4godbole at gmail.com> wrote:
> Hello,
>
> I want to know about the test which is conducted just after the PETSC is configured on the system to assess the possible speedup by MPI processes. I have saved the result file which says:
> Number of MPI processes 10
> Process 0 iitm
> Process 1 iitm
> Process 2 iitm
> Process 3 iitm
> Process 4 iitm
> Process 5 iitm
> Process 6 iitm
> Process 7 iitm
> Process 8 iitm
> Process 9 iitm
> Function Rate (MB/s)
> Copy: 24186.8271
> Scale: 23914.0401
> Add: 27271.7149
> Triad: 27787.1630
> ------------------------------------------------
> np speedup
> 1 1.0
> 2 1.75
> 3 1.86
> 4 1.84
> 5 1.85
> 6 1.83
> 7 1.76
> 8 1.79
> 9 1.8
> 10 1.8
> Estimation of possible speedup of MPI programs based on Streams benchmark.
>
> 1) What parameters the speedup depends on?
>
> I am not sure what you are asking here. Speedup is defined as the time T on 1 process divided
> by the time T_p on p processes:
>
> S = T/T_p
>
> 2) what are the hardware requirements for higher speedup? ( i was expecting atleast 5 times speedup after generating 10 processes.
>
> STREAMS measures the speedup of vectors operations, which are very similar to sparse matrix operations. Both
> are limited by memory bandwidth.
>
> 3) what could possibly be done to improve this ?
>
> 1) You could buy more nodes, since each node has a path to memory
>
> 2) You could change algorithms, but this has proven very difficult
>
> Thanks,
>
> Matt
>
> i have intel® Core™ i7-4930K CPU @ 3.40GHz × 12 with 32 GB of RAM and 1 TB disk space.
>
>
> Thanks
> Siddhesh M Godbole
>
> 5th year Dual Degree,
> Civil Eng & Applied Mech.
> IIT Madras
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
More information about the petsc-users
mailing list