[petsc-dev] New implementation of PtAP based on all-at-once algorithm

Thu Apr 11 18:08:21 CDT 2019

Hi Developers,

I just want to share a good news.  It is known PETSc-ptap-scalable is
taking too much memory for some applications because it needs to build
intermediate data structures.  According to Mark's suggestions, I
implemented the  all-at-once algorithm that does not cache any intermediate
data.

I did some comparison,  the new implementation is actually scalable in
terms of the memory usage and the compute time even though it is still
slower than "ptap-scalable".   There are some memory profiling results (see
the attachments). The new all-at-once implementation use the similar amount
of memory as hypre, but it way faster than hypre.

For example, for a problem with 14,893,346,880 unknowns using 10,000
processor cores,  There are timing results:

Hypre algorithm:

MatPtAP               50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04
6.0e+02 33  0  1  0 17  33  0  1  0 17     0
MatPtAPSymbolic       50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAPNumeric        50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04
6.0e+02 33  0  1  0 17  33  0  1  0 17     0

PETSc scalable PtAP:

MatPtAP               50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05
7.5e+02  2  1  4  6 20   2  1  4  6 20 129418
MatPtAPSymbolic       50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05
3.5e+02  1  0  3  3  9   1  0  3  3  9     0
MatPtAPNumeric        50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05
4.0e+02  1  1  2  4 11   1  1  2  4 11 235011

New implementation of the all-at-once algorithm:

MatPtAP               50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05
6.0e+02  4  0  7  7 17   4  0  7  7 17     0
MatPtAPSymbolic       50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05
2.0e+02  2  0  5  4  6   2  0  5  4  6     0
MatPtAPNumeric        50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05
4.0e+02  2  0  2  3 11   2  0  2  3 11     0

You can see here the all-at-once is a bit slower than ptap-scalable, but it
uses only much less memory.

Fande
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190411/454a8770/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hypre_ptap.png
Type: image/png
Size: 582936 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190411/454a8770/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_ptap_allatonce.png
Type: image/png
Size: 555212 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190411/454a8770/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_ptap_scalable.png
Type: image/png
Size: 681917 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190411/454a8770/attachment-0005.png>