[petsc-users] MPI speedup

Thu Apr 23 06:10:23 CDT 2015

On Thu, Apr 23, 2015 at 6:02 AM, siddhesh godbole <
siddhesh4godbole at gmail.com> wrote:

> Matt
>
> So that means the time on 10 processes in merely 1.8 times the time on 1
> process?? this is quite difficult to digest!  Okay so if memory bandwidth
>  a controlling factor here how will forming a cluster with same machines
>  solve this problem?
> my cpu has max memory bandwidth of 59 GB/s .
>

It is tempting to say that computer manufacturers are lying when they
report performance. You can
bring down 59 GB/s of memory, so that is less than 8B doubles/s. If you are
adding two vectors, that
means you do 1 flop for every 2 doubles you bring down, so you can do 4
GF/s. Your processor has
6 cores, each running at 3.4 GHz, and it can do 4flops/cycle using the
vector instructions, so they
report that it can do 81 GF/s, but you cannot get values to it fast enough
to compute.

Now, you are getting only 50% of peak bandwidth, which is bad. Maybe Jed
knows why you are not
getting 75-80% which is what we expect.

  Thanks,

     Matt

> Apologies if the question are too silly!
>
> *Siddhesh M Godbole*
>
> 5th year Dual Degree,
> Civil Eng & Applied Mech.
> IIT Madras
>
> On Thu, Apr 23, 2015 at 4:22 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Thu, Apr 23, 2015 at 5:47 AM, siddhesh godbole <
>> siddhesh4godbole at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I want to know about the test which is conducted just after the PETSC is
>>> configured on the system to assess the possible speedup by MPI processes. I
>>> have saved the result file which says:
>>> *Number of MPI processes 10*
>>> *Process 0 iitm*
>>> *Process 1 iitm*
>>> *Process 2 iitm*
>>> *Process 3 iitm*
>>> *Process 4 iitm*
>>> *Process 5 iitm*
>>> *Process 6 iitm*
>>> *Process 7 iitm*
>>> *Process 8 iitm*
>>> *Process 9 iitm*
>>> *Function      Rate (MB/s) *
>>> *Copy:       24186.8271*
>>> *Scale:      23914.0401*
>>> *Add:        27271.7149*
>>> *Triad:      27787.1630*
>>> *------------------------------------------------*
>>> *np  speedup*
>>> *1 1.0*
>>> *2 1.75*
>>> *3 1.86*
>>> *4 1.84*
>>> *5 1.85*
>>> *6 1.83*
>>> *7 1.76*
>>> *8 1.79*
>>> *9 1.8*
>>> *10 1.8*
>>> *Estimation of possible speedup of MPI programs based on Streams
>>> benchmark.*
>>>
>>> 1) What parameters the speedup depends on?
>>>
>>
>> I am not sure what you are asking here. Speedup is defined as the time T
>> on 1 process divided
>> by the time T_p on p processes:
>>
>>   S = T/T_p
>>
>>
>>> 2) what are the hardware requirements for higher speedup? ( i was
>>> expecting atleast 5 times speedup after generating 10 processes.
>>>
>>
>> STREAMS measures the speedup of vectors operations, which are very
>> similar to sparse matrix operations. Both
>> are limited by memory bandwidth.
>>
>>
>>> 3) what could possibly be done to improve this ?
>>>
>>
>> 1) You could buy more nodes, since each node has a path to memory
>>
>> 2) You could change algorithms, but this has proven very difficult
>>
>>    Thanks,
>>
>>       Matt
>>
>>
>>> i have intel® Core™ i7-4930K CPU @ 3.40GHz × 12 with 32 GB of RAM and 1
>>> TB disk space.
>>>
>>>
>>> Thanks
>>> *Siddhesh M Godbole*
>>>
>>> 5th year Dual Degree,
>>> Civil Eng & Applied Mech.
>>> IIT Madras
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150423/2ec6b6f5/attachment.html>