[petsc-dev] Kokkos/Crusher perforance
Jed Brown
jed at jedbrown.org
Tue Jan 25 11:25:48 CST 2022
Barry Smith <bsmith at petsc.dev> writes:
>> On Jan 25, 2022, at 11:55 AM, Jed Brown <jed at jedbrown.org> wrote:
>>
>> Barry Smith <bsmith at petsc.dev> writes:
>>
>>> Thanks Mark, far more interesting. I've improved the formatting to make it easier to read (and fixed width font for email reading)
>>>
>>> * Can you do same run with say 10 iterations of Jacobi PC?
>>>
>>> * PCApply performance (looks like GAMG) is terrible! Problems too small?
>>
>> This is -pc_type jacobi.
>
> Dang, how come it doesn't warn about all the gamg arguments passed to the program? I saw them and jump to the wrong conclusion.
We don't have -options_left by default. Mark has a big .petscrc or PETSC_OPTIONS.
> How come PCApply is so low while Pointwise mult (which should be all of PCApply) is high?
I also think that's weird.
>>
>>> * VecScatter time is completely dominated by SFPack! Junchao what's up with that? Lots of little kernels in the PCApply? PCJACOBI run will help clarify where that is coming from.
>>
>> It's all in MatMult.
>>
>> I'd like to see a run that doesn't wait for the GPU.
>
> Indeed
What is the command line option to turn PetscLogGpuTimeBegin/PetscLogGpuTimeEnd into a no-op even when -log_view is on? I know it'll mess up attribution, but it'll still tell us how long the solve took.
Also, can we make WaitForKokkos a no-op? I don't think it's necessary for correctness (docs indicate kokkos::fence synchronizes).
More information about the petsc-dev
mailing list