[petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

Anthony Jourdon jourdon.anthon at gmail.com
Fri Jan 26 05:48:17 CST 2024


Hello,

Thank you for your answers.
I am working with Dave May on this topic.

Still running src/ksp/ksp/tutorials/ex34 with the same options reported by
Dave, I added the option -log_view_gpu_time.
Now the log provides gpu flop/s instead of nans.
However, I have trouble understanding the numbers reported in the log (file
attached).

   1. The numbers reported for Total Mflop/s and GPU Mflop/s are different
   even when 100% of the work is supposed to be done on the GPU.
   2. The numbers reported for GPU Mflop/s are always higher than the
   numbers reported for Total Mflop/s.

As I understand, the Total Mflop/s should be the sum of both GPU and CPU
flop/s, but if the gpu does 100% of the work, why are there different
numbers reported by the GPU and Total flop/s columns and why the GPU flop/s
are always higher than the Total flop/s ?
Or am I missing something?

Thank you for your attention.
Anthony Jourdon



Le sam. 20 janv. 2024 à 02:25, Barry Smith <bsmith at petsc.dev> a écrit :

>
>    Nans indicate we do not have valid computational times for these
> operations; think of them as Not Available. Providing valid times for the
> "inner" operations listed with Nans requires inaccurate times (higher) for
> the outer operations, since extra synchronization between the CPU and GPU
> must be done to get valid times for the inner options. We opted to have the
> best valid times for the outer operations since those times reflect the
> time of the application.
>
>
>
>
>
> > On Jan 19, 2024, at 12:35 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
> >
> > Hi all,
> >
> > I am trying to understand the logging information associated with the
> %flops-performed-on-the-gpu reported by -log_view when running
> >   src/ksp/ksp/tutorials/ex34
> > with the following options
> > -da_grid_x 192
> > -da_grid_y 192
> > -da_grid_z 192
> > -dm_mat_type seqaijhipsparse
> > -dm_vec_type seqhip
> > -ksp_max_it 10
> > -ksp_monitor
> > -ksp_type richardson
> > -ksp_view
> > -log_view
> > -mg_coarse_ksp_max_it 2
> > -mg_coarse_ksp_type richardson
> > -mg_coarse_pc_type none
> > -mg_levels_ksp_type richardson
> > -mg_levels_pc_type none
> > -options_left
> > -pc_mg_levels 3
> > -pc_mg_log
> > -pc_type mg
> >
> > This config is not intended to actually solve the problem, rather it is
> a stripped down set of options designed to understand what parts of the
> smoothers are being executed on the GPU.
> >
> > With respect to the log file attached, my first set of questions related
> to the data reported under "Event Stage 2: MG Apply".
> >
> > [1] Why is the log littered with nan's?
> > * I don't understand how and why "GPU Mflop/s" should be reported as nan
> when a value is given for "GPU %F" (see MatMult for example).
> >
> > * For events executed on the GPU, I assume the column "Time (sec)"
> relates to "CPU execute time", this would explain why we see a nan in "Time
> (sec)" for MatMult.
> > If my assumption is correct, how should I interpret the column "Flop
> (Max)" which is showing 1.92e+09?
> > I would assume of "Time (sec)" relates to the CPU then "Flop (Max)"
> should also relate to CPU and GPU flops would be logged in "GPU Mflop/s"
> >
> > [2] More curious is that within "Event Stage 2: MG Apply" KSPSolve,
> MGSmooth Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" as
> 93. I believe this value should be 100 as the smoother (and coarse grid
> solver) are configured as richardson(2)+none and thus should run entirely
> on the GPU.
> > Furthermore, when one inspects all events listed under "Event Stage 2:
> MG Apply" those events which do flops correctly report "GPU %F" as 100.
> > And the events showing "GPU %F" = 0 such as
> >   MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
> > don't do any flops (on the CPU or GPU) - which is also correct (although
> non GPU events should show nan??)
> >
> > Hence I am wondering what is the explanation for the missing 7% from
> "GPU %F" for KSPSolve and MGSmooth {0,1,2}??
> >
> > Does anyone understand this -log_view, or can explain to me how to
> interpret it?
> >
> > It could simply be that:
> > a) something is messed up with -pc_mg_log
> > b) something is messed up with the PETSc build
> > c) I am putting too much faith in -log_view and should profile the code
> differently.
> >
> > Either way I'd really like to understand what is going on.
> >
> >
> > Cheers,
> > Dave
> >
> >
> >
> > <ex34_192_mg_seqhip_richardson_pcnone.o5748667>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240126/336c1fae/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex34_192_mg_seqhip_richardson_pcnone_gpulog.out
Type: application/octet-stream
Size: 22577 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240126/336c1fae/attachment-0001.obj>


More information about the petsc-users mailing list