[petsc-dev] odd log behavior
Mark Adams
mfadams at lbl.gov
Mon May 16 18:31:33 CDT 2022
I am not sure I understand the logic, we print the ratio of max/min.
I report max and look at the ratio to see if I might be catching some load
imbalance or whatever. Is there a problem with that workflow?
I assume there is or you would not have done this, so can I add a method
that I think also has legitimate values?
I did observe a 15% hit on my whole stage with the GPU timers but I would
still like to take my chances with:
Landau Operator 29 1.0 1.1200e-02 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
0.0e+00 3 0 0 0 0 40 4 0 0 0 405 509 0 0.00e+00 0
0.00e+00 100
Landau Jacobian 15 1.0 7.8189e-03 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
0.0e+00 2 0 0 0 0 28 4 0 0 0 580 731 0 0.00e+00 0
0.00e+00 100
Landau Mass 14 1.0 3.3759e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 12 0 0 0 0 0 0 0 0.00e+00 0
0.00e+00 0
These methods require barriers at the end before they call MatSetValuesCOO,
which is like < 1% of these total matrix times, so I trust the data w/o GPU
times set.
(The rest of the stage is the rest of the time step, which is mostly the
linear solver, and I guess that is where the 15% comes from but I have not
dug into it)
Thanks,
Mark
On Wed, Apr 27, 2022 at 11:06 AM Barry Smith <bsmith at petsc.dev> wrote:
>
> Only KSPSolve, SNESSolve, TSSolve will have legitimate values.
> "High-level" functions like PtAP can be asynchronous (meaning the GPU
> returns to the CPU to do more stuff) before they are complete, hence their
> timings would be incorrect and so they must be labeled with a special
> marker and not have numbers for time.
>
> Jed reports a possible 10% smaller time for KSPSolve() in this mode in
> some of his runs, compared to timing correctly the inner operations, and
> thus feels it is the best default.
>
> Barry
>
>
> On Apr 27, 2022, at 10:08 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Tue, Apr 26, 2022 at 8:00 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>> The current nan output has to be replaced to get the column alignment
>> correct, I just didn't feel like making that change also in the same MR.
>>
>> Something like Unknown or anything that fits in the column space would
>> be fine. It just means for the given run the timing numbers are not
>> meaningful/correct for those events.
>>
>
> Just a note, just about every event is NAN for me. My GAMG setup that is
> all CPU is NAN. High level functions like PtAP as well.
> That said, adding -log_view_gpu_time is fine. Not worth the churn.
>
>
>> This is to obtain the best meaningful results for the outer events per
>> Jed since timing the inner events accurately introduces extra time in the
>> outer events. That is it is not possible to have the best accurate times
>> for both inner events and outer events in the same run. So if you want to
>> compare KSPSolve timings, for example, you run as-is, it you want to
>> examine, low-level vector operations run also with -log_view_gpu_time but
>> know that the KSP times are higher than need be.
>>
>> Sorry for the confusion.
>>
>>
>>
>>
>> On Apr 26, 2022, at 3:49 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Tue, Apr 26, 2022 at 12:03 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Well, Nans are a clear sign that something is very wrong.
>>>
>>
>> Barry chose them so that it could not be mistaken for an actual number.
>>
>> Matt
>>
>>
>>> On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch <
>>> jacob.fai at gmail.com> wrote:
>>>
>>>> There is an automatic warning that shows when you do run with
>>>> `-log_view_gpu_time`, but perhaps there should also be an automatic warning
>>>> when *not* running with it. It is unfortunate that NaN is the value printed
>>>> as this implies a bug but AFAIK it is unavoidable (Barry can say more on
>>>> this though).
>>>>
>>>> Best regards,
>>>>
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>>
>>>> > On Apr 26, 2022, at 09:48, Jose E. Roman <jroman at dsic.upv.es> wrote:
>>>> >
>>>> > You have to add -log_view_gpu_time
>>>> > See https://gitlab.com/petsc/petsc/-/merge_requests/5056
>>>> >
>>>> > Jose
>>>> >
>>>> >
>>>> >> El 26 abr 2022, a las 16:39, Mark Adams <mfadams at lbl.gov> escribió:
>>>> >>
>>>> >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log
>>>> timing data except the two 'Solve' lines.
>>>> >> Just cg/jacobi on snes/ex56.
>>>> >>
>>>> >> Any ideas?
>>>> >>
>>>> >> VecTDot 2 1.0 nan nan 1.20e+01 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 100
>>>> >> VecNorm 2 1.0 nan nan 1.00e+01 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 100
>>>> >> VecCopy 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 0
>>>> >> VecSet 5 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 0
>>>> >> VecAXPY 4 1.0 nan nan 2.40e+01 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 100
>>>> >> VecPointwiseMult 1 1.0 nan nan 3.00e+00 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 100
>>>> >> KSPSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0
>>>> 0.00e+00 0
>>>> >> KSPSolve 1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00
>>>> 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 -nan 0
>>>> 0.00e+00 0 0.00e+00 100
>>>> >> SNESSolve 1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00
>>>> 0.0e+00 0.0e+00 72 56 0 0 0 100100 0 0 0 25 -nan 0
>>>> 0.00e+00 0 0.00e+00 0
>>>> >
>>>>
>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220516/0390ac97/attachment-0001.html>
More information about the petsc-dev
mailing list