[petsc-dev] odd log behavior

Tue May 17 06:24:15 CDT 2022

Oh, this is not hard:

      /* /\* Put NaN into the time for all events that may not be time
accurately since they may happen asynchronously on the GPU *\/ */
      /* #if defined(PETSC_HAVE_DEVICE) */
      /* if (!PetscLogGpuTimeFlag && petsc_gflops > 0) { */
      /*   memcpy(&gmaxt,&nas,sizeof(PetscLogDouble)); */
      /*   PetscCall(PetscEventRegLogGetEvent(stageLog->eventLog, name,
&eventid)); */
      /*   if (eventid != SNES_Solve && eventid != KSP_Solve && eventid !=
TS_Step && eventid != TAO_Solve) { */
      /*     memcpy(&mint,&nas,sizeof(PetscLogDouble)); */
      /*     memcpy(&maxt,&nas,sizeof(PetscLogDouble)); */
      /*   } */
      /* } */
      /* #endif */

On Tue, May 17, 2022 at 7:17 AM Mark Adams <mfadams at lbl.gov> wrote:

> Can I get some advice on how to add/hack an event now into the reported
> events?
>
> I am noticing that all of the difference in my clean (warmed up) stage
> comes for KSPSolve:
>
> KSPSolve             144 1.0 *2.1745e+00* 1.0 7.50e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 42 89  0  0  0  68 98  0  0  0 34511   52718      0
> 0.00e+00    0 0.00e+00 100
> KSPSolve             144 1.0 *1.2275e+00* 1.0 7.50e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 29 89  0  0  0  56 98  0  0  0 61138   * -nan*      0
> 0.00e+00    0 0.00e+00 100
>
> Maybe it is easier for me to just run it twice and get my Landau numbers
> from the full report and get KSPSolve numbers from the nan report.
> (Not great for a reproducibility doc)
>
> Thanks,
> Mark
>
>
>
> On Mon, May 16, 2022 at 10:38 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Ok thanks,
>>
>> And looking at the actual tests that I am doing more carefully I am not
>> seeing much difference in the numbers that I care about with or without the
>> GPU timers.
>>
>> And FWIW, from a user experience point of view, having -log_view present
>> you with a sea of nans by default is a bit disheartening for a first time
>> user. And they have to send an email and we have to respond ....
>>
>>
>> On Mon, May 16, 2022 at 9:36 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   Mark,
>>>
>>>     Yes, Jed has already asked for ways to select certain operations at
>>> runtime that would get timed via the GPU timer. I am not sure of the ideal
>>> way to handle it. One possibility is PetscLogGpuTimeEvent(event number)
>>> which indicates events you want the time logged for. Another is
>>> PetscLogGpuTimeObjectID(object ID) which indicates all events for a
>>> particular object class would be logged. In either or both cases the
>>> PetscLogGPUTime() calls would need to become smarter to determine when they
>>> are turned on and off depending on the current state of the logging. Since
>>> the events are nested and there can be multiple objects related to the
>>> timings I don't have a clear plan in my mind.
>>>
>>>    Barry
>>>
>>>
>>> On May 16, 2022, at 7:31 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> I am not sure I understand the logic, we print the ratio of max/min.
>>> I report max and look at the ratio to see if I might be catching some
>>> load imbalance or whatever. Is there a problem with that workflow?
>>>
>>> I assume there is or you would not have done this, so can I add a method
>>> that I think also has legitimate values?
>>>
>>> I did observe a 15% hit on my whole stage with the GPU timers but I
>>> would still like to take my chances with:
>>>
>>> Landau Operator       29 1.0 1.1200e-02 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  3  0  0  0  0  40  4  0  0  0   405     509      0 0.00e+00    0
>>> 0.00e+00 100
>>> Landau Jacobian       15 1.0 7.8189e-03 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  2  0  0  0  0  28  4  0  0  0   580     731      0 0.00e+00    0
>>> 0.00e+00 100
>>> Landau Mass           14 1.0 3.3759e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0  12  0  0  0  0     0       0      0 0.00e+00    0
>>> 0.00e+00  0
>>>
>>> These methods require barriers at the end before they
>>> call MatSetValuesCOO, which is like < 1% of these total matrix times, so I
>>> trust the data w/o GPU times set.
>>>
>>> (The rest of the stage is the rest of the time step, which is mostly the
>>> linear solver, and I guess that is where the 15% comes from but I have not
>>> dug into it)
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>> On Wed, Apr 27, 2022 at 11:06 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>   Only KSPSolve, SNESSolve, TSSolve will have legitimate values.
>>>> "High-level" functions like PtAP can be asynchronous (meaning the GPU
>>>> returns to the CPU to do more stuff) before they are complete, hence their
>>>> timings would be incorrect and so they must be labeled with a special
>>>> marker and not have numbers for time.
>>>>
>>>>   Jed reports a possible 10% smaller time for KSPSolve() in this mode
>>>> in some of his runs, compared to timing correctly the inner operations, and
>>>> thus feels it is the best default.
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On Apr 27, 2022, at 10:08 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Apr 26, 2022 at 8:00 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>   The current nan output has to be replaced to get the column
>>>>> alignment correct, I just didn't feel like making that change also in the
>>>>> same MR.
>>>>>
>>>>>   Something like Unknown or anything that fits in the column space
>>>>> would be fine. It just means for the given run the timing numbers are not
>>>>> meaningful/correct for those events.
>>>>>
>>>>
>>>> Just a note, just about every event is NAN for me. My GAMG setup that
>>>> is all CPU is NAN. High level functions like PtAP as well.
>>>> That said, adding -log_view_gpu_time is fine. Not worth the churn.
>>>>
>>>>
>>>>> This is to obtain the best meaningful results for the outer events per
>>>>> Jed since timing the inner events accurately introduces extra time in the
>>>>> outer events. That is it is not possible to have the best accurate times
>>>>> for both inner events and outer events in the same run. So if you want to
>>>>> compare KSPSolve timings, for example, you run as-is, it you want to
>>>>> examine, low-level vector operations run also with -log_view_gpu_time but
>>>>> know that the KSP times are higher than need be.
>>>>>
>>>>>   Sorry for the confusion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Apr 26, 2022, at 3:49 PM, Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>> On Tue, Apr 26, 2022 at 12:03 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> Well, Nans are a clear sign that something is very wrong.
>>>>>>
>>>>>
>>>>> Barry chose them so that it could not be mistaken for an actual number.
>>>>>
>>>>>    Matt
>>>>>
>>>>>
>>>>>> On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch <
>>>>>> jacob.fai at gmail.com> wrote:
>>>>>>
>>>>>>> There is an automatic warning that shows when you do run with
>>>>>>> `-log_view_gpu_time`, but perhaps there should also be an automatic warning
>>>>>>> when *not* running with it. It is unfortunate that NaN is the value printed
>>>>>>> as this implies a bug but AFAIK it is unavoidable (Barry can say more on
>>>>>>> this though).
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Jacob Faibussowitsch
>>>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>>>
>>>>>>> > On Apr 26, 2022, at 09:48, Jose E. Roman <jroman at dsic.upv.es>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > You have to add -log_view_gpu_time
>>>>>>> > See https://gitlab.com/petsc/petsc/-/merge_requests/5056
>>>>>>> >
>>>>>>> > Jose
>>>>>>> >
>>>>>>> >
>>>>>>> >> El 26 abr 2022, a las 16:39, Mark Adams <mfadams at lbl.gov>
>>>>>>> escribió:
>>>>>>> >>
>>>>>>> >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log
>>>>>>> timing data except the two 'Solve' lines.
>>>>>>> >> Just cg/jacobi on snes/ex56.
>>>>>>> >>
>>>>>>> >> Any ideas?
>>>>>>> >>
>>>>>>> >> VecTDot                2 1.0   nan nan 1.20e+01 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>> >> VecNorm                2 1.0   nan nan 1.00e+01 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>> >> VecCopy                2 1.0   nan nan 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>> >> VecSet                 5 1.0   nan nan 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>> >> VecAXPY                4 1.0   nan nan 2.40e+01 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>> >> VecPointwiseMult       1 1.0   nan nan 3.00e+00 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>> >> KSPSetUp               1 1.0   nan nan 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>> >> KSPSolve               1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>> >> SNESSolve              1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 72 56  0  0  0 100100  0  0  0    25    -nan      0
>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>>
>>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220517/598a6317/attachment.html>