[petsc-dev] odd log behavior

Mark Adams mfadams at lbl.gov
Mon May 16 18:31:33 CDT 2022


I am not sure I understand the logic, we print the ratio of max/min.
I report max and look at the ratio to see if I might be catching some load
imbalance or whatever. Is there a problem with that workflow?

I assume there is or you would not have done this, so can I add a method
that I think also has legitimate values?

I did observe a 15% hit on my whole stage with the GPU timers but I would
still like to take my chances with:

Landau Operator       29 1.0 1.1200e-02 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
0.0e+00  3  0  0  0  0  40  4  0  0  0   405     509      0 0.00e+00    0
0.00e+00 100
Landau Jacobian       15 1.0 7.8189e-03 1.0 4.54e+06 1.0 0.0e+00 0.0e+00
0.0e+00  2  0  0  0  0  28  4  0  0  0   580     731      0 0.00e+00    0
0.00e+00 100
Landau Mass           14 1.0 3.3759e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0  12  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0

These methods require barriers at the end before they call MatSetValuesCOO,
which is like < 1% of these total matrix times, so I trust the data w/o GPU
times set.

(The rest of the stage is the rest of the time step, which is mostly the
linear solver, and I guess that is where the 15% comes from but I have not
dug into it)

Thanks,
Mark


On Wed, Apr 27, 2022 at 11:06 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>   Only KSPSolve, SNESSolve, TSSolve will have legitimate values.
> "High-level" functions like PtAP can be asynchronous (meaning the GPU
> returns to the CPU to do more stuff) before they are complete, hence their
> timings would be incorrect and so they must be labeled with a special
> marker and not have numbers for time.
>
>   Jed reports a possible 10% smaller time for KSPSolve() in this mode in
> some of his runs, compared to timing correctly the inner operations, and
> thus feels it is the best default.
>
>   Barry
>
>
> On Apr 27, 2022, at 10:08 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Tue, Apr 26, 2022 at 8:00 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   The current nan output has to be replaced to get the column alignment
>> correct, I just didn't feel like making that change also in the same MR.
>>
>>   Something like Unknown or anything that fits in the column space would
>> be fine. It just means for the given run the timing numbers are not
>> meaningful/correct for those events.
>>
>
> Just a note, just about every event is NAN for me. My GAMG setup that is
> all CPU is NAN. High level functions like PtAP as well.
> That said, adding -log_view_gpu_time is fine. Not worth the churn.
>
>
>> This is to obtain the best meaningful results for the outer events per
>> Jed since timing the inner events accurately introduces extra time in the
>> outer events. That is it is not possible to have the best accurate times
>> for both inner events and outer events in the same run. So if you want to
>> compare KSPSolve timings, for example, you run as-is, it you want to
>> examine, low-level vector operations run also with -log_view_gpu_time but
>> know that the KSP times are higher than need be.
>>
>>   Sorry for the confusion.
>>
>>
>>
>>
>> On Apr 26, 2022, at 3:49 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Tue, Apr 26, 2022 at 12:03 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Well, Nans are a clear sign that something is very wrong.
>>>
>>
>> Barry chose them so that it could not be mistaken for an actual number.
>>
>>    Matt
>>
>>
>>> On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch <
>>> jacob.fai at gmail.com> wrote:
>>>
>>>> There is an automatic warning that shows when you do run with
>>>> `-log_view_gpu_time`, but perhaps there should also be an automatic warning
>>>> when *not* running with it. It is unfortunate that NaN is the value printed
>>>> as this implies a bug but AFAIK it is unavoidable (Barry can say more on
>>>> this though).
>>>>
>>>> Best regards,
>>>>
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>>
>>>> > On Apr 26, 2022, at 09:48, Jose E. Roman <jroman at dsic.upv.es> wrote:
>>>> >
>>>> > You have to add -log_view_gpu_time
>>>> > See https://gitlab.com/petsc/petsc/-/merge_requests/5056
>>>> >
>>>> > Jose
>>>> >
>>>> >
>>>> >> El 26 abr 2022, a las 16:39, Mark Adams <mfadams at lbl.gov> escribió:
>>>> >>
>>>> >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log
>>>> timing data except the two 'Solve' lines.
>>>> >> Just cg/jacobi on snes/ex56.
>>>> >>
>>>> >> Any ideas?
>>>> >>
>>>> >> VecTDot                2 1.0   nan nan 1.20e+01 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00 100
>>>> >> VecNorm                2 1.0   nan nan 1.00e+01 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00 100
>>>> >> VecCopy                2 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00  0
>>>> >> VecSet                 5 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00  0
>>>> >> VecAXPY                4 1.0   nan nan 2.40e+01 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   1  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00 100
>>>> >> VecPointwiseMult       1 1.0   nan nan 3.00e+00 1.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00 100
>>>> >> KSPSetUp               1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan    -nan      0 0.00e+00    0
>>>> 0.00e+00  0
>>>> >> KSPSolve               1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00
>>>> 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0    -nan      0
>>>> 0.00e+00    0 0.00e+00 100
>>>> >> SNESSolve              1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00
>>>> 0.0e+00 0.0e+00 72 56  0  0  0 100100  0  0  0    25    -nan      0
>>>> 0.00e+00    0 0.00e+00  0
>>>> >
>>>>
>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220516/0390ac97/attachment-0001.html>


More information about the petsc-dev mailing list