<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Jan 17, 2016 at 10:13 AM, Griffith, Boyce Eugene <span dir="ltr"><<a href="mailto:boyceg@email.unc.edu" target="_blank">boyceg@email.unc.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Barry --<br>
<br>
Another random thought --- are these smallish direct solves things that make sense to (try to) offload to a GPU?<br></blockquote><div><br></div><div>Possibly, but the only clear-cut wins are for BLAS3, so we would need to stack up the identical solves.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks,<br>
<br>
-- Boyce<br>
<div class="HOEnZb"><div class="h5"><br>
> On Jan 16, 2016, at 10:46 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
><br>
><br>
> Boyce,<br>
><br>
> Of course anything is possible in software. But I expect an optimization to not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() rather than some PETSc option that we could add (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE).<br>
><br>
> I would start by copying PCSetUp_ASM(), stripping out all the setup stuff that doesn't relate to your code and then mark identical domains so you don't need to call MatGetSubMatrices() on those domains and don't create a new KSP for each one of those subdomains (but reuses a common one). The PCApply_ASM() should be hopefully be reusable so long as you have created the full array of KSP objects (some of which will be common). If you increase the reference counts of the common KSP in<br>
> PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() should also work unchanged<br>
><br>
> Good luck,<br>
><br>
> Barry<br>
><br>
>> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <<a href="mailto:boyceg@email.unc.edu">boyceg@email.unc.edu</a>> wrote:<br>
>><br>
>><br>
>>> On Jan 16, 2016, at 7:00 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
>>><br>
>>><br>
>>> Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth.<br>
>><br>
>> Looks like LU factorizations are about 25% for this particular case. Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which. How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks?<br>
>><br>
>> Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance.<br>
>><br>
>> -- Boyce<br>
>><br>
>>> If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file.<br>
>>><br>
>>> Barry<br>
>>><br>
>>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <<a href="mailto:amneetb@live.unc.edu">amneetb@live.unc.edu</a>> wrote:<br>
>>>><br>
>>>><br>
>>>><br>
>>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
>>>>><br>
>>>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won’t.<br>
>>>><br>
>>>> <a href="http://hpctoolkit.org/download/hpcviewer/" rel="noreferrer" target="_blank">http://hpctoolkit.org/download/hpcviewer/</a><br>
>>>><br>
>>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to<br>
>>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling<br>
>>>> under Calling Context View, Callers View and Flat View.<br>
>>>><br>
>>>><br>
>>>><br>
>>>> <hpctoolkit-main2d-database.zip><br>
>>>><br>
>>><br>
>><br>
><br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</div></div>