[petsc-users] GAMG scaling

Sat Dec 22 02:39:36 CST 2018

OK, so this thread has drifted, see title :)

On Fri, Dec 21, 2018 at 10:01 PM Fande Kong <fdkong.jd at gmail.com> wrote:

> Sorry, hit the wrong button.
>
>
>
> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>>
>>
>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Also, you mentioned that you are using 10 levels. This is very strange
>>> with GAMG. You can run with -info and grep on GAMG to see the sizes and the
>>> number of non-zeros per level. You should coarsen at a rate of about 2^D to
>>> 3^D with GAMG (with 10 levels this would imply a very large fine grid
>>> problem so I suspect there is something strange going on with coarsening).
>>> Mark
>>>
>>
>> Hi Mark,
>>
>>
> Thanks for your email. We did not try GAMG much for our problems since we
> still have troubles to figure out how to effectively use GAMG so far.
> Instead, we are building our own customized  AMG  that needs to use PtAP to
> construct coarse matrices.  The customized AMG works pretty well for our
> specific simulations. The bottleneck right now is that PtAP might
> take too much memory, and the code crashes within the function "PtAP". I
> defiantly need a memory profiler to confirm my statement here.
>
> Thanks,
>
> Fande Kong,
>
>
>
>>
>>
>>
>>>
>>> On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> Fande:
>>>> I will explore it and get back to you.
>>>> Does anyone know how to profile memory usage?
>>>> Hong
>>>>
>>>> Thanks, Hong,
>>>>>
>>>>> I just briefly went through the code. I was wondering if it is
>>>>> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
>>>>> release the memory after the coarse matrix is assembled. I understand you
>>>>> may still want to reuse these data structures by default but for my
>>>>> simulation, the preconditioner is fixed and there is no reason to keep the
>>>>> "c->ptap".
>>>>>
>>>>
>>>>> It would be great, if we could have this optional functionality.
>>>>>
>>>>> Fande Kong,
>>>>>
>>>>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <hzhang at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> We use nonscalable implementation as default, and switch to scalable
>>>>>> for matrices over finer grids. You may use option '-matptap_via scalable'
>>>>>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>>>>>> works.
>>>>>> Hong
>>>>>>
>>>>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>>>>>> for "large" problems, which is determined by some heuristic.
>>>>>>>
>>>>>>>    Barry
>>>>>>>
>>>>>>>
>>>>>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <hzhang at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>> > Fande:
>>>>>>> > Hong,
>>>>>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>>>>>> algorithms.
>>>>>>> >
>>>>>>> > On Wed, May 3, 2017 at 10:17 AM Hong <hzhang at mcs.anl.gov> wrote:
>>>>>>> > Mark,
>>>>>>> > Below is the copy of my email sent to you on Feb 27:
>>>>>>> >
>>>>>>> > I implemented scalable MatPtAP and did comparisons of three
>>>>>>> implementations using ex56.c on alcf cetus machine (this machine has small
>>>>>>> memory, 1GB/core):
>>>>>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>>>>>> > - scalable PtAP:       do sparse axpy without use of PN array
>>>>>>> >
>>>>>>> > What PN means here?
>>>>>>> > Global number of columns of P.
>>>>>>> >
>>>>>>> > - hypre PtAP.
>>>>>>> >
>>>>>>> > The results are attached. Summary:
>>>>>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than
>>>>>>> hypre PtAP
>>>>>>> > - scalable PtAP is 4x faster than hypre PtAP
>>>>>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>>>>> >
>>>>>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>>>>>>> am implementing an AMG algorithm based on PETSc right now, and it is
>>>>>>> working well. But we find some a bottleneck with PtAP. For the same P and
>>>>>>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, while
>>>>>>> hypre still can generates the coarse matrix.
>>>>>>> >
>>>>>>> > I do not want to just use the HYPRE one because we had to
>>>>>>> duplicate matrices if I used HYPRE PtAP.
>>>>>>> >
>>>>>>> > It would be nice if you guys already have done some compassions on
>>>>>>> these implementations for the memory usage.
>>>>>>> > Do you encounter memory issue with  scalable PtAP?
>>>>>>> >
>>>>>>> > By default do we use the scalable PtAP?? Do we have to specify
>>>>>>> some options to use the scalable version of PtAP?  If so, it would be nice
>>>>>>> to use the scalable version by default.  I am totally missing something
>>>>>>> here.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> >
>>>>>>> > Fande
>>>>>>> >
>>>>>>> >
>>>>>>> > Karl had a student in the summer who improved MatPtAP(). Do you
>>>>>>> use the latest version of petsc?
>>>>>>> > HYPRE may use less memory than PETSc because it does not save and
>>>>>>> reuse the matrices.
>>>>>>> >
>>>>>>> > I do not understand why generating coarse matrix fails due to out
>>>>>>> of memory. Do you use direct solver at coarse grid?
>>>>>>> > Hong
>>>>>>> >
>>>>>>> > Based on above observation, I set the default PtAP algorithm as
>>>>>>> 'nonscalable'.
>>>>>>> > When PN > local estimated nonzero of C=PtAP, then switch default
>>>>>>> to 'scalable'.
>>>>>>> > User can overwrite default.
>>>>>>> >
>>>>>>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I
>>>>>>> get
>>>>>>> > MatPtAP                   3.6224e+01 (nonscalable for small mats,
>>>>>>> scalable for larger ones)
>>>>>>> > scalable MatPtAP     4.6129e+01
>>>>>>> > hypre                        1.9389e+02
>>>>>>> >
>>>>>>> > This work in on petsc-master. Give it a try. If you encounter any
>>>>>>> problem, let me know.
>>>>>>> >
>>>>>>> > Hong
>>>>>>> >
>>>>>>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfadams at lbl.gov>
>>>>>>> wrote:
>>>>>>> > (Hong), what is the current state of optimizing RAP for scaling?
>>>>>>> >
>>>>>>> > Nate, is driving 3D elasticity problems at scaling with GAMG and
>>>>>>> we are working out performance problems. They are hitting problems at ~1.5B
>>>>>>> dof problems on a basic Cray (XC30 I think).
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Mark
>>>>>>> >
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181222/901a50a7/attachment.html>