[petsc-dev] Feed back on report on performance of vector operations on Summit requested

Smith, Barry F. bsmith at mcs.anl.gov
Tue Oct 29 16:01:12 CDT 2019


  Karl,

    Thanks for your comments.


> On Oct 10, 2019, at 12:34 AM, Karl Rupp via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> 
> Hi,
> 
> Table 2 reports negative latencies. This doesn't look right to me ;-)
> If it's the outcome of a parameter fit to the performance model, then use a parameter name (e.g. alpha) instead of the term 'latency'.

  Per Jed's suggestion we will include some plots and additional information to make clearer what happens on the CPU.

    
> 
> Figure 11 has a very narrow range in the y-coordinate and thus exaggerates the variation greatly. "GPU performance" should be adjusted to something like "execution time" to explain the meaning of the y-axis.

    Thanks. Fixed by adding the next size of 10^7 also

> 
> Page 12: The latency for VecDot is higher than for VecAXPY because VecDot requires the result to be copied back to the host. This is an additional operation.

   Good point. We will include this. 
> 
> Regarding performance measurements: Did you synchronize after each kernel launch? I.e. did you run (approach A)

For all our runs, as stated near the beginning of the text many times == 1. This seems to work fine and are reproducible so I don't see a need to run multiple times.

  Barry



> for (many times) {
>   synchronize();
>   start_timer();
>   kernel_launch();
>   synchronize();
>   stop_timer();
> }
> and then take averages over the timings obtained, or did you (approach B)
> synchronize();
> start_timer();
> for (many times) {
>   kernel_launch();
> }
> synchronize();
> stop_timer();
> and then divide the obtained time by the number of runs?
> 
> Approach A will report a much higher latency than the latter, because synchronizations are expensive (i.e. your latency consists of kernel launch latency plus device synchronization latency). Approach B is slightly over-optimistic, but I've found it to better match what one observes for an algorithm involving several kernel launches.
> 
> Best regards,
> Karli
> 
> 
> 
> On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:
>>    We've prepared a short report on the performance of vector operations on Summit and would appreciate any feed back including: inconsistencies, lack of clarity, incorrect notation or terminology, etc.
>>    Thanks
>>     Barry, Hannah, and Richard



More information about the petsc-dev mailing list