[petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
Griffith, Boyce Eugene
boyceg at email.unc.edu
Sun Jan 17 10:13:15 CST 2016
Barry --
Another random thought --- are these smallish direct solves things that make sense to (try to) offload to a GPU?
Thanks,
-- Boyce
> On Jan 16, 2016, at 10:46 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> Boyce,
>
> Of course anything is possible in software. But I expect an optimization to not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() rather than some PETSc option that we could add (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE).
>
> I would start by copying PCSetUp_ASM(), stripping out all the setup stuff that doesn't relate to your code and then mark identical domains so you don't need to call MatGetSubMatrices() on those domains and don't create a new KSP for each one of those subdomains (but reuses a common one). The PCApply_ASM() should be hopefully be reusable so long as you have created the full array of KSP objects (some of which will be common). If you increase the reference counts of the common KSP in
> PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() should also work unchanged
>
> Good luck,
>
> Barry
>
>> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>
>>
>>> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>
>>> Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth.
>>
>> Looks like LU factorizations are about 25% for this particular case. Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which. How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks?
>>
>> Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance.
>>
>> -- Boyce
>>
>>> If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file.
>>>
>>> Barry
>>>
>>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>>>
>>>>
>>>>
>>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won’t.
>>>>
>>>> http://hpctoolkit.org/download/hpcviewer/
>>>>
>>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to
>>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
>>>> under Calling Context View, Callers View and Flat View.
>>>>
>>>>
>>>>
>>>> <hpctoolkit-main2d-database.zip>
>>>>
>>>
>>
>
More information about the petsc-users
mailing list