[petsc-dev] odd log behavior

Mon May 16 21:38:34 CDT 2022

Ok thanks,

And looking at the actual tests that I am doing more carefully I am not
seeing much difference in the numbers that I care about with or without the
GPU timers.

And FWIW, from a user experience point of view, having -log_view present
you with a sea of nans by default is a bit disheartening for a first time
user. And they have to send an email and we have to respond ....

On Mon, May 16, 2022 at 9:36 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   Mark,
>
>     Yes, Jed has already asked for ways to select certain operations at
> runtime that would get timed via the GPU timer. I am not sure of the ideal
> way to handle it. One possibility is PetscLogGpuTimeEvent(event number)
> which indicates events you want the time logged for. Another is
> PetscLogGpuTimeObjectID(object ID) which indicates all events for a
> particular object class would be logged. In either or both cases the
> PetscLogGPUTime() calls would need to become smarter to determine when they
> are turned on and off depending on the current state of the logging. Since
> the events are nested and there can be multiple objects related to the
> timings I don't have a clear plan in my mind.
>
>    Barry
>
>
> On May 16, 2022, at 7:31 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> I am not sure I understand the logic, we print the ratio of max/min.
> I report max and look at the ratio to see if I might be catching some load
> imbalance or whatever. Is there a problem with that workflow?
>
> I assume there is or you would not have done this, so can I add a method
> that I think also has legitimate values?
>
> I did observe a 15% hit on my whole stage with the GPU timers but I would
> still like to take my chances with:
>
> Landau Operator       29 1.0 1.1200e-02 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  3  0  0  0  0  40  4  0  0  0   405     509      0 0.00e+00    0
> 0.00e+00 100
> Landau Jacobian       15 1.0 7.8189e-03 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0  28  4  0  0  0   580     731      0 0.00e+00    0
> 0.00e+00 100
> Landau Mass           14 1.0 3.3759e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0  12  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0
>
> These methods require barriers at the end before they
> call MatSetValuesCOO, which is like < 1% of these total matrix times, so I
> trust the data w/o GPU times set.
>
> (The rest of the stage is the rest of the time step, which is mostly the
> linear solver, and I guess that is where the 15% comes from but I have not
> dug into it)
>
> Thanks,
> Mark
>
>
> On Wed, Apr 27, 2022 at 11:06 AM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   Only KSPSolve, SNESSolve, TSSolve will have legitimate values.
>> "High-level" functions like PtAP can be asynchronous (meaning the GPU
>> returns to the CPU to do more stuff) before they are complete, hence their
>> timings would be incorrect and so they must be labeled with a special
>> marker and not have numbers for time.
>>
>>   Jed reports a possible 10% smaller time for KSPSolve() in this mode in
>> some of his runs, compared to timing correctly the inner operations, and
>> thus feels it is the best default.
>>
>>   Barry
>>
>>
>> On Apr 27, 2022, at 10:08 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>
>>
>> On Tue, Apr 26, 2022 at 8:00 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   The current nan output has to be replaced to get the column alignment
>>> correct, I just didn't feel like making that change also in the same MR.
>>>
>>>   Something like Unknown or anything that fits in the column space would
>>> be fine. It just means for the given run the timing numbers are not
>>> meaningful/correct for those events.
>>>
>>
>> Just a note, just about every event is NAN for me. My GAMG setup that is
>> all CPU is NAN. High level functions like PtAP as well.
>> That said, adding -log_view_gpu_time is fine. Not worth the churn.
>>
>>
>>> This is to obtain the best meaningful results for the outer events per
>>> Jed since timing the inner events accurately introduces extra time in the
>>> outer events. That is it is not possible to have the best accurate times
>>> for both inner events and outer events in the same run. So if you want to
>>> compare KSPSolve timings, for example, you run as-is, it you want to
>>> examine, low-level vector operations run also with -log_view_gpu_time but
>>> know that the KSP times are higher than need be.
>>>
>>>   Sorry for the confusion.
>>>
>>>
>>>
>>>
>>> On Apr 26, 2022, at 3:49 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>> On Tue, Apr 26, 2022 at 12:03 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> Well, Nans are a clear sign that something is very wrong.
>>>>
>>>
>>> Barry chose them so that it could not be mistaken for an actual number.
>>>
>>>    Matt
>>>
>>>
>>>> On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch <
>>>> jacob.fai at gmail.com> wrote:
>>>>
>>>>> There is an automatic warning that shows when you do run with
>>>>> `-log_view_gpu_time`, but perhaps there should also be an automatic warning
>>>>> when *not* running with it. It is unfortunate that NaN is the value printed
>>>>> as this implies a bug but AFAIK it is unavoidable (Barry can say more on
>>>>> this though).
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jacob Faibussowitsch
>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>
>>>>> > On Apr 26, 2022, at 09:48, Jose E. Roman <jroman at dsic.upv.es> wrote:
>>>>> >
>>>>> > You have to add -log_view_gpu_time
>>>>> > See https://gitlab.com/petsc/petsc/-/merge_requests/5056
>>>>> >
>>>>> > Jose
>>>>> >
>>>>> >
>>>>> >> El 26 abr 2022, a las 16:39, Mark Adams <mfadams at lbl.gov> escribió:
>>>>> >>
>>>>> >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log
>>>>> timing data except the two 'Solve' lines.
>>>>> >> Just cg/jacobi on snes/ex56.
>>>>> >>
>>>>> >> Any ideas?
>>>>> >>
>>>>> >> VecTDot                2 1.0   nan nan 1.20e+01 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00 100
>>>>> >> VecNorm                2 1.0   nan nan 1.00e+01 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00 100
>>>>> >> VecCopy                2 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00  0
>>>>> >> VecSet                 5 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00  0
>>>>> >> VecAXPY                4 1.0   nan nan 2.40e+01 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   1  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00 100
>>>>> >> VecPointwiseMult       1 1.0   nan nan 3.00e+00 1.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00 100
>>>>> >> KSPSetUp               1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>>> 0.00e+00  0
>>>>> >> KSPSolve               1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0    -nan      0
>>>>> 0.00e+00    0 0.00e+00 100
>>>>> >> SNESSolve              1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00
>>>>> 0.0e+00 0.0e+00 72 56  0  0  0 100100  0  0  0    25    -nan      0
>>>>> 0.00e+00    0 0.00e+00  0
>>>>> >
>>>>>
>>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220516/c1d6024a/attachment.html>