[petsc-users] GAMG scaling

Fri Dec 21 00:46:48 CST 2018

Thanks, Hong,

I just briefly went through the code. I was wondering if it is possible to
destroy "c->ptap" (that caches a lot of intermediate data) to release the
memory after the coarse matrix is assembled. I understand you may still
want to reuse these data structures by default but for my simulation, the
preconditioner is fixed and there is no reason to keep the "c->ptap".

It would be great, if we could have this optional functionality.

Fande Kong,

On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:

> We use nonscalable implementation as default, and switch to scalable for
> matrices over finer grids. You may use option '-matptap_via scalable' to
> force scalable PtAP  implementation for all PtAP. Let me know if it works.
> Hong
>
> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for
>> "large" problems, which is determined by some heuristic.
>>
>>    Barry
>>
>>
>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> >
>> >
>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:
>> > Fande:
>> > Hong,
>> > Thanks for your improvements on PtAP that is critical for MG-type
>> algorithms.
>> >
>> > On Wed, May 3, 2017 at 10:17 AM Hong <hzhang at mcs.anl.gov> wrote:
>> > Mark,
>> > Below is the copy of my email sent to you on Feb 27:
>> >
>> > I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has small
>> memory, 1GB/core):
>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>> > - scalable PtAP:       do sparse axpy without use of PN array
>> >
>> > What PN means here?
>> > Global number of columns of P.
>> >
>> > - hypre PtAP.
>> >
>> > The results are attached. Summary:
>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>> > - scalable PtAP is 4x faster than hypre PtAP
>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> >
>> > I was wondering how much more memory PETSc PtAP uses than hypre? I am
>> implementing an AMG algorithm based on PETSc right now, and it is working
>> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
>> PtAP fails to generate a coarse matrix due to out of memory, while hypre
>> still can generates the coarse matrix.
>> >
>> > I do not want to just use the HYPRE one because we had to duplicate
>> matrices if I used HYPRE PtAP.
>> >
>> > It would be nice if you guys already have done some compassions on
>> these implementations for the memory usage.
>> > Do you encounter memory issue with  scalable PtAP?
>> >
>> > By default do we use the scalable PtAP?? Do we have to specify some
>> options to use the scalable version of PtAP?  If so, it would be nice to
>> use the scalable version by default.  I am totally missing something here.
>> >
>> > Thanks,
>> >
>> > Fande
>> >
>> >
>> > Karl had a student in the summer who improved MatPtAP(). Do you use the
>> latest version of petsc?
>> > HYPRE may use less memory than PETSc because it does not save and reuse
>> the matrices.
>> >
>> > I do not understand why generating coarse matrix fails due to out of
>> memory. Do you use direct solver at coarse grid?
>> > Hong
>> >
>> > Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> > User can overwrite default.
>> >
>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>> > MatPtAP                   3.6224e+01 (nonscalable for small mats,
>> scalable for larger ones)
>> > scalable MatPtAP     4.6129e+01
>> > hypre                        1.9389e+02
>> >
>> > This work in on petsc-master. Give it a try. If you encounter any
>> problem, let me know.
>> >
>> > Hong
>> >
>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> > (Hong), what is the current state of optimizing RAP for scaling?
>> >
>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we are
>> working out performance problems. They are hitting problems at ~1.5B dof
>> problems on a basic Cray (XC30 I think).
>> >
>> > Thanks,
>> > Mark
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181220/4f183ac6/attachment-0001.html>