[petsc-users] MPI speedup

Satish Balay balay at mcs.anl.gov
Thu Apr 23 16:53:29 CDT 2015


http://ark.intel.com/products/77780/Intel-Core-i7-4930K-Processor-12M-Cache-up-to-3_90-GHz

Looks like this CPU has 4 memory channels and supports DDR3-1866 memory.

So to get max memory bandwidth - you should make sure you have:

- DDR3-1866 [or PC3-14900] memory installed.

- have 4  memory cards [or multiple of 4] installed.

Satish

On Thu, 23 Apr 2015, siddhesh godbole wrote:

> Matt
> 
> So that means the time on 10 processes in merely 1.8 times the time on 1
> process?? this is quite difficult to digest!  Okay so if memory bandwidth
>  a controlling factor here how will forming a cluster with same machines
>  solve this problem?
> my cpu has max memory bandwidth of 59 GB/s .
> 
> 
> Apologies if the question are too silly!
> 
> *Siddhesh M Godbole*
> 
> 5th year Dual Degree,
> Civil Eng & Applied Mech.
> IIT Madras
> 
> On Thu, Apr 23, 2015 at 4:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> > On Thu, Apr 23, 2015 at 5:47 AM, siddhesh godbole <
> > siddhesh4godbole at gmail.com> wrote:
> >
> >> Hello,
> >>
> >> I want to know about the test which is conducted just after the PETSC is
> >> configured on the system to assess the possible speedup by MPI processes. I
> >> have saved the result file which says:
> >> *Number of MPI processes 10*
> >> *Process 0 iitm*
> >> *Process 1 iitm*
> >> *Process 2 iitm*
> >> *Process 3 iitm*
> >> *Process 4 iitm*
> >> *Process 5 iitm*
> >> *Process 6 iitm*
> >> *Process 7 iitm*
> >> *Process 8 iitm*
> >> *Process 9 iitm*
> >> *Function      Rate (MB/s) *
> >> *Copy:       24186.8271*
> >> *Scale:      23914.0401*
> >> *Add:        27271.7149*
> >> *Triad:      27787.1630*
> >> *------------------------------------------------*
> >> *np  speedup*
> >> *1 1.0*
> >> *2 1.75*
> >> *3 1.86*
> >> *4 1.84*
> >> *5 1.85*
> >> *6 1.83*
> >> *7 1.76*
> >> *8 1.79*
> >> *9 1.8*
> >> *10 1.8*
> >> *Estimation of possible speedup of MPI programs based on Streams
> >> benchmark.*
> >>
> >> 1) What parameters the speedup depends on?
> >>
> >
> > I am not sure what you are asking here. Speedup is defined as the time T
> > on 1 process divided
> > by the time T_p on p processes:
> >
> >   S = T/T_p
> >
> >
> >> 2) what are the hardware requirements for higher speedup? ( i was
> >> expecting atleast 5 times speedup after generating 10 processes.
> >>
> >
> > STREAMS measures the speedup of vectors operations, which are very similar
> > to sparse matrix operations. Both
> > are limited by memory bandwidth.
> >
> >
> >> 3) what could possibly be done to improve this ?
> >>
> >
> > 1) You could buy more nodes, since each node has a path to memory
> >
> > 2) You could change algorithms, but this has proven very difficult
> >
> >    Thanks,
> >
> >       Matt
> >
> >
> >> i have intel® Core™ i7-4930K CPU @ 3.40GHz × 12 with 32 GB of RAM and 1
> >> TB disk space.
> >>
> >>
> >> Thanks
> >> *Siddhesh M Godbole*
> >>
> >> 5th year Dual Degree,
> >> Civil Eng & Applied Mech.
> >> IIT Madras
> >>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > -- Norbert Wiener
> >
> 


More information about the petsc-users mailing list