[petsc-dev] Feed back on report on performance of vector operations on Summit requested

Karl Rupp rupp at iue.tuwien.ac.at
Thu Oct 10 00:34:29 CDT 2019


Hi,

Table 2 reports negative latencies. This doesn't look right to me ;-)
If it's the outcome of a parameter fit to the performance model, then 
use a parameter name (e.g. alpha) instead of the term 'latency'.

Figure 11 has a very narrow range in the y-coordinate and thus 
exaggerates the variation greatly. "GPU performance" should be adjusted 
to something like "execution time" to explain the meaning of the y-axis.

Page 12: The latency for VecDot is higher than for VecAXPY because 
VecDot requires the result to be copied back to the host. This is an 
additional operation.

Regarding performance measurements: Did you synchronize after each 
kernel launch? I.e. did you run (approach A)
  for (many times) {
    synchronize();
    start_timer();
    kernel_launch();
    synchronize();
    stop_timer();
  }
and then take averages over the timings obtained, or did you (approach B)
  synchronize();
  start_timer();
  for (many times) {
    kernel_launch();
  }
  synchronize();
  stop_timer();
and then divide the obtained time by the number of runs?

Approach A will report a much higher latency than the latter, because 
synchronizations are expensive (i.e. your latency consists of kernel 
launch latency plus device synchronization latency). Approach B is 
slightly over-optimistic, but I've found it to better match what one 
observes for an algorithm involving several kernel launches.

Best regards,
Karli



On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:
> 
>     We've prepared a short report on the performance of vector 
> operations on Summit and would appreciate any feed back including: 
> inconsistencies, lack of clarity, incorrect notation or terminology, etc.
> 
>     Thanks
> 
>      Barry, Hannah, and Richard
> 
> 
> 
> 
> 


More information about the petsc-dev mailing list