[petsc-users] GAMG scaling
Fande Kong
fdkong.jd at gmail.com
Fri Dec 21 20:52:52 CST 2018
Thanks so much, Hong,
If any new finding, please let me know.
On Fri, Dec 21, 2018 at 9:36 AM Zhang, Hong <hzhang at mcs.anl.gov> wrote:
> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
>
We are using gperftools
https://gperftools.github.io/gperftools/heapprofile.html
Fande,
> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is possible
>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>> the memory after the coarse matrix is assembled. I understand you may still
>> want to reuse these data structures by default but for my simulation, the
>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable for
>>> matrices over finer grids. You may use option '-matptap_via scalable' to
>>> force scalable PtAP implementation for all PtAP. Let me know if it works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>>
>>>>
>>>> See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>>> for "large" problems, which is determined by some heuristic.
>>>>
>>>> Barry
>>>>
>>>>
>>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <hzhang at mcs.anl.gov>
>>>> wrote:
>>>> > Fande:
>>>> > Hong,
>>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>>> algorithms.
>>>> >
>>>> > On Wed, May 3, 2017 at 10:17 AM Hong <hzhang at mcs.anl.gov> wrote:
>>>> > Mark,
>>>> > Below is the copy of my email sent to you on Feb 27:
>>>> >
>>>> > I implemented scalable MatPtAP and did comparisons of three
>>>> implementations using ex56.c on alcf cetus machine (this machine has small
>>>> memory, 1GB/core):
>>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>>> > - scalable PtAP: do sparse axpy without use of PN array
>>>> >
>>>> > What PN means here?
>>>> > Global number of columns of P.
>>>> >
>>>> > - hypre PtAP.
>>>> >
>>>> > The results are attached. Summary:
>>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>>>> PtAP
>>>> > - scalable PtAP is 4x faster than hypre PtAP
>>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>> >
>>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I am
>>>> implementing an AMG algorithm based on PETSc right now, and it is working
>>>> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
>>>> PtAP fails to generate a coarse matrix due to out of memory, while hypre
>>>> still can generates the coarse matrix.
>>>> >
>>>> > I do not want to just use the HYPRE one because we had to duplicate
>>>> matrices if I used HYPRE PtAP.
>>>> >
>>>> > It would be nice if you guys already have done some compassions on
>>>> these implementations for the memory usage.
>>>> > Do you encounter memory issue with scalable PtAP?
>>>> >
>>>> > By default do we use the scalable PtAP?? Do we have to specify some
>>>> options to use the scalable version of PtAP? If so, it would be nice to
>>>> use the scalable version by default. I am totally missing something here.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Fande
>>>> >
>>>> >
>>>> > Karl had a student in the summer who improved MatPtAP(). Do you use
>>>> the latest version of petsc?
>>>> > HYPRE may use less memory than PETSc because it does not save and
>>>> reuse the matrices.
>>>> >
>>>> > I do not understand why generating coarse matrix fails due to out of
>>>> memory. Do you use direct solver at coarse grid?
>>>> > Hong
>>>> >
>>>> > Based on above observation, I set the default PtAP algorithm as
>>>> 'nonscalable'.
>>>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>>>> 'scalable'.
>>>> > User can overwrite default.
>>>> >
>>>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>>>> > MatPtAP 3.6224e+01 (nonscalable for small mats,
>>>> scalable for larger ones)
>>>> > scalable MatPtAP 4.6129e+01
>>>> > hypre 1.9389e+02
>>>> >
>>>> > This work in on petsc-master. Give it a try. If you encounter any
>>>> problem, let me know.
>>>> >
>>>> > Hong
>>>> >
>>>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>> > (Hong), what is the current state of optimizing RAP for scaling?
>>>> >
>>>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we
>>>> are working out performance problems. They are hitting problems at ~1.5B
>>>> dof problems on a basic Cray (XC30 I think).
>>>> >
>>>> > Thanks,
>>>> > Mark
>>>> >
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181221/5634e127/attachment.html>
More information about the petsc-users
mailing list