[petsc-dev] New implementation of PtAP based on all-at-once algorithm
Smith, Barry F.
bsmith at mcs.anl.gov
Thu Apr 11 19:46:36 CDT 2019
Excellent! Thanks
Barry
> On Apr 11, 2019, at 6:08 PM, Fande Kong via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>
> Hi Developers,
>
> I just want to share a good news. It is known PETSc-ptap-scalable is taking too much memory for some applications because it needs to build intermediate data structures. According to Mark's suggestions, I implemented the all-at-once algorithm that does not cache any intermediate data.
>
> I did some comparison, the new implementation is actually scalable in terms of the memory usage and the compute time even though it is still slower than "ptap-scalable". There are some memory profiling results (see the attachments). The new all-at-once implementation use the similar amount of memory as hypre, but it way faster than hypre.
>
> For example, for a problem with 14,893,346,880 unknowns using 10,000 processor cores, There are timing results:
>
> Hypre algorithm:
>
> MatPtAP 50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 6.0e+02 33 0 1 0 17 33 0 1 0 17 0
> MatPtAPSymbolic 50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatPtAPNumeric 50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 6.0e+02 33 0 1 0 17 33 0 1 0 17 0
>
> PETSc scalable PtAP:
>
> MatPtAP 50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05 7.5e+02 2 1 4 6 20 2 1 4 6 20 129418
> MatPtAPSymbolic 50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05 3.5e+02 1 0 3 3 9 1 0 3 3 9 0
> MatPtAPNumeric 50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05 4.0e+02 1 1 2 4 11 1 1 2 4 11 235011
>
> New implementation of the all-at-once algorithm:
>
> MatPtAP 50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05 6.0e+02 4 0 7 7 17 4 0 7 7 17 0
> MatPtAPSymbolic 50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05 2.0e+02 2 0 5 4 6 2 0 5 4 6 0
> MatPtAPNumeric 50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05 4.0e+02 2 0 2 3 11 2 0 2 3 11 0
>
>
> You can see here the all-at-once is a bit slower than ptap-scalable, but it uses only much less memory.
>
>
> Fande
>
> <hypre_ptap.png><petsc_ptap_allatonce.png><petsc_ptap_scalable.png>
More information about the petsc-dev
mailing list