[petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

Dave May dave.mayhem23 at gmail.com
Fri Jan 19 11:35:17 CST 2024


Hi all,

I am trying to understand the logging information associated with the
%flops-performed-on-the-gpu reported by -log_view when running
  src/ksp/ksp/tutorials/ex34
with the following options
-da_grid_x 192
-da_grid_y 192
-da_grid_z 192
-dm_mat_type seqaijhipsparse
-dm_vec_type seqhip
-ksp_max_it 10
-ksp_monitor
-ksp_type richardson
-ksp_view
-log_view
-mg_coarse_ksp_max_it 2
-mg_coarse_ksp_type richardson
-mg_coarse_pc_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_type none
-options_left
-pc_mg_levels 3
-pc_mg_log
-pc_type mg

This config is not intended to actually solve the problem, rather it is a
stripped down set of options designed to understand what parts of the
smoothers are being executed on the GPU.

With respect to the log file attached, my first set of questions related to
the data reported under "Event Stage 2: MG Apply".

[1] Why is the log littered with nan's?
* I don't understand how and why "GPU Mflop/s" should be reported as nan
when a value is given for "GPU %F" (see MatMult for example).

* For events executed on the GPU, I assume the column "Time (sec)" relates
to "CPU execute time", this would explain why we see a nan in "Time (sec)"
for MatMult.
If my assumption is correct, how should I interpret the column "Flop (Max)"
which is showing 1.92e+09?
I would assume of "Time (sec)" relates to the CPU then "Flop (Max)" should
also relate to CPU and GPU flops would be logged in "GPU Mflop/s"

[2] More curious is that within "Event Stage 2: MG Apply" KSPSolve,
MGSmooth Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" as
93. I believe this value should be 100 as the smoother (and coarse grid
solver) are configured as richardson(2)+none and thus should run entirely
on the GPU.
Furthermore, when one inspects all events listed under "Event Stage 2: MG
Apply" those events which do flops correctly report "GPU %F" as 100.
And the events showing "GPU %F" = 0 such as
  MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
don't do any flops (on the CPU or GPU) - which is also correct
(although non GPU events should show nan??)

Hence I am wondering what is the explanation for the missing 7% from "GPU
%F" for KSPSolve and MGSmooth {0,1,2}??

Does anyone understand this -log_view, or can explain to me how to
interpret it?

It could simply be that:
a) something is messed up with -pc_mg_log
b) something is messed up with the PETSc build
c) I am putting too much faith in -log_view and should profile the code
differently.

Either way I'd really like to understand what is going on.


Cheers,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240119/f35978ee/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex34_192_mg_seqhip_richardson_pcnone.o5748667
Type: application/octet-stream
Size: 25092 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240119/f35978ee/attachment-0001.obj>


More information about the petsc-users mailing list