[petsc-dev] [petsc-users] Bad memory scaling with PETSc 3.10

Fande Kong fdkong.jd at gmail.com
Thu Mar 21 17:56:55 CDT 2019


Here are some memory profiling results in case someone is interested. A
problem with 100,852,704 unknowns is calculated on 144 processor cores.

petsc_ptap_scalable.png is generated using "-matptap_via scalable"

hypre_ptap.png is generated using " -matptap_via hypre"

Look at the leaf: MatPtAP_MPIAIJ_MPIAIJ, where HYPRE-PtAP takes 215 MB, and
PETSc-PtAP takes 869.4 MB.


Thanks,

Fande,


On Thu, Mar 21, 2019 at 11:08 AM Fande Kong <fdkong.jd at gmail.com> wrote:

> Hi Mark,
>
> Thanks for your email.
>
> On Thu, Mar 21, 2019 at 6:39 AM Mark Adams via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
>> I'm probably screwing up some sort of history by jumping into dev, but
>> this is a dev comment ...
>>
>> (1) -matptap_via hypre: This call the hypre package to do the PtAP trough
>>> an all-at-once triple product. In our experiences, it is the most memory
>>> efficient, but could be slow.
>>>
>>
>> FYI,
>>
>> I visited LLNL in about 1997 and told them how I did RAP. Simple 4 nested
>> loops. They were very interested. Clearly they did it this way after I
>> talked to them. This approach came up here a while back (eg, we should
>> offer this as an option).
>>
>> Anecdotally, I don't see a noticeable difference in performance on my 3D
>> elasticity problems between my old code (still used by the bone modeling
>> people) and ex56 ...
>>
>
> You may not see differences when the problem is small.  What I observed is
> that the HYPRE PtAP is ten times slower than the PETSc scalable PtAP when
> we had a 3-billions problem on 10K processor cores.
>
>
>>
>> My kernel is an unrolled dense matrix triple product. I doubt Hypre did
>> this. It ran at about 2x+ the flop rate of the mat-vec at scale on the SP3
>> in 2004.
>>
>
> Could you explain this more by adding some small examples?
>
>  I am profiling the current PETSc algorithms on some real simulations. If
> PETSc PtAP still takes more memory than desired with my fix (
> https://bitbucket.org/petsc/petsc/pull-requests/1452), I am going to
> implement the all-at-once triple product with dropping all intermediate
> data.  If you have any documents (except the code you posted before), it
> would be a great help.
>
> Fande,
>
>
>> Mark
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190321/6b200d86/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hypre_ptap.png
Type: image/png
Size: 582936 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190321/6b200d86/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_ptap_scalable.png
Type: image/png
Size: 681917 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190321/6b200d86/attachment-0003.png>


More information about the petsc-dev mailing list