[petsc-dev] Feed back on report on performance of vector operations on Summit requested
Karl Rupp
rupp at iue.tuwien.ac.at
Thu Oct 10 00:34:29 CDT 2019
Hi,
Table 2 reports negative latencies. This doesn't look right to me ;-)
If it's the outcome of a parameter fit to the performance model, then
use a parameter name (e.g. alpha) instead of the term 'latency'.
Figure 11 has a very narrow range in the y-coordinate and thus
exaggerates the variation greatly. "GPU performance" should be adjusted
to something like "execution time" to explain the meaning of the y-axis.
Page 12: The latency for VecDot is higher than for VecAXPY because
VecDot requires the result to be copied back to the host. This is an
additional operation.
Regarding performance measurements: Did you synchronize after each
kernel launch? I.e. did you run (approach A)
for (many times) {
synchronize();
start_timer();
kernel_launch();
synchronize();
stop_timer();
}
and then take averages over the timings obtained, or did you (approach B)
synchronize();
start_timer();
for (many times) {
kernel_launch();
}
synchronize();
stop_timer();
and then divide the obtained time by the number of runs?
Approach A will report a much higher latency than the latter, because
synchronizations are expensive (i.e. your latency consists of kernel
launch latency plus device synchronization latency). Approach B is
slightly over-optimistic, but I've found it to better match what one
observes for an algorithm involving several kernel launches.
Best regards,
Karli
On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:
>
> We've prepared a short report on the performance of vector
> operations on Summit and would appreciate any feed back including:
> inconsistencies, lack of clarity, incorrect notation or terminology, etc.
>
> Thanks
>
> Barry, Hannah, and Richard
>
>
>
>
>
More information about the petsc-dev
mailing list