[petsc-dev] New implementation of PtAP based on all-at-once algorithm

Fande Kong fdkong.jd at gmail.com
Mon Apr 15 13:56:29 CDT 2019


On Mon, Apr 15, 2019 at 6:49 AM Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Apr 15, 2019 at 12:41 AM Fande Kong via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
>> On Fri, Apr 12, 2019 at 7:27 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>>
>>>
>>> On Thu, Apr 11, 2019 at 11:42 PM Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>>
>>>>
>>>>
>>>> > On Apr 11, 2019, at 9:07 PM, Mark Adams via petsc-dev <
>>>> petsc-dev at mcs.anl.gov> wrote:
>>>> >
>>>> > Interesting, nice work.
>>>> >
>>>> > It would be interesting to get the flop counters working.
>>>> >
>>>> > This looks like GMG, I assume 3D.
>>>> >
>>>> > The degree of parallelism is not very realistic. You should probably
>>>> run a 10x smaller problem, at least, or use 10x more processes.
>>>>
>>>>    Why do you say that? He's got his machine with a certain amount of
>>>> physical memory per node, are you saying he should ignore/not use 90% of
>>>> that physical memory for his simulation?
>>>
>>>
>>> In my experience 1.5M equations/process about 50x more than applications
>>> run, but this is just anecdotal. Some apps are dominated by the linear
>>> solver in terms of memory but some apps use a lot of memory in the physics
>>> parts of the code.
>>>
>>
>> The test case is solving the multigroup neutron transport equations where
>> each mesh vertex could be associated with a hundred or a thousand
>> variables. The mesh is actually small so that it can be handled efficiently
>> in the physics part of the code. 90% of the memory is consumed by the
>> solver (SNES, KSP, PC). This is the reason I was trying to implement a
>> memory friendly PtAP.
>>
>>
>>> The one app that I can think of where the memory usage is dominated by
>>> the solver does like 10 (pseudo) time steps with pretty hard nonlinear
>>> solves, so in the end they are not bound by turnaround time. But they are
>>> kind of a odd (academic) application and not very representative of what I
>>> see in the broader comp sci community. And these guys do have a scalable
>>> code so instead of waiting a week on the queue to run a 10 hour job that
>>> uses 10% of the machine, they wait a day to run a 2 hour job that takes 50%
>>> of the machine because centers scheduling policies work that way.
>>>
>>
>> Our code is scalable but we do not have a huge machine unfortunately.
>>
>>
>>>
>>> He should buy a machine 10x bigger just because it means having less
>>>> degrees of freedom per node (whose footing the bill for this purchase?). At
>>>> INL they run simulations for a purpose, not just for scalability studies
>>>> and there are no dang GPUs or barely used over-sized monstrocities sitting
>>>> around to brag about twice a year at SC.
>>>>
>>>
>>> I guess the are the nuke guys. I've never worked with them or seen this
>>> kind of complexity analysis in their talks, but OK if they fill up memory
>>> with the solver then this is representative of a significant (DOE)app.
>>>
>>
>> You do not see the complexity analysis  in the talks because most of the
>> people at INL live in a different community.  I will convince more people
>> give talks in our community in the future.
>>
>> We focus on the nuclear energy simulations that involve multiphysics
>> (neutron transport, mechanics contact, computational materials,
>> compressible/incompressible flows, two-phase flows, etc.). We are
>> developing a flexible platform (open source) that allows different physics
>> guys couple their code together efficiently.
>> https://mooseframework.inl.gov/old
>>
>
> Fande, this is very interesting. Can you tell me:
>
>   1) A rough estimate of dofs/vertex (or cell or face) depending on where
> you put unknowns
>

The big run (Neutron transport equations) posted earlier has 576 variables
on each mesh vertex. Physics guys think at the current stage 100-1000
variables (the number of energy groups times the number of neutron flying
directions) on each mesh vertex will give us an acceptable simulation
result.  1000 variables are preferred.



>
>   2) Are all unknowns on the same vertex coupled together? If not, where
> do you specify block sparsity?
>

Yes, they are physically coupled together through the scattering and the
fission events. But we are using the matrix-free method, and the variables
coupling is ignored in the  preconditioning matrix so that the system won't
take that much memory.


>
>   3) How are the coefficients from the equation discretized on the mesh?
>

The coefficients (often referred to as cross sections for neutron guys)
could be different for each variable, and they totally depend on the
reactor configuration. My current simulation indeed uses heterogeneous
materials.

I actually have a preprint that presents more details on the simulation.
https://arxiv.org/abs/1903.03659

Thanks,

Fande,




>
>   Thanks!
>
>      Matt
>
>
>> Thanks,
>>
>> Fande,
>>
>>
>>>
>>>
>>>>
>>>>    Barry
>>>>
>>>>
>>>>
>>>> > I guess it does not matter. This basically like a one node run
>>>> because the subdomains are so large.
>>>> >
>>>> > And are you sure the numerics are the same with and without hypre?
>>>> Hypre is 15x slower. Any ideas what is going on?
>>>> >
>>>> > It might be interesting to scale this test down to a node to see if
>>>> this is from communication.
>>>> >
>>>> > Again, nice work,
>>>> > Mark
>>>> >
>>>> >
>>>> > On Thu, Apr 11, 2019 at 7:08 PM Fande Kong <fdkong.jd at gmail.com>
>>>> wrote:
>>>> > Hi Developers,
>>>> >
>>>> > I just want to share a good news.  It is known PETSc-ptap-scalable is
>>>> taking too much memory for some applications because it needs to build
>>>> intermediate data structures.  According to Mark's suggestions, I
>>>> implemented the  all-at-once algorithm that does not cache any intermediate
>>>> data.
>>>> >
>>>> > I did some comparison,  the new implementation is actually scalable
>>>> in terms of the memory usage and the compute time even though it is still
>>>> slower than "ptap-scalable".   There are some memory profiling results (see
>>>> the attachments). The new all-at-once implementation use the similar amount
>>>> of memory as hypre, but it way faster than hypre.
>>>> >
>>>> > For example, for a problem with 14,893,346,880 unknowns using 10,000
>>>> processor cores,  There are timing results:
>>>> >
>>>> > Hypre algorithm:
>>>> >
>>>> > MatPtAP               50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07
>>>> 3.3e+04 6.0e+02 33  0  1  0 17  33  0  1  0 17     0
>>>> > MatPtAPSymbolic       50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> > MatPtAPNumeric        50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07
>>>> 3.3e+04 6.0e+02 33  0  1  0 17  33  0  1  0 17     0
>>>> >
>>>> > PETSc scalable PtAP:
>>>> >
>>>> > MatPtAP               50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07
>>>> 2.0e+05 7.5e+02  2  1  4  6 20   2  1  4  6 20 129418
>>>> > MatPtAPSymbolic       50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07
>>>> 1.4e+05 3.5e+02  1  0  3  3  9   1  0  3  3  9     0
>>>> > MatPtAPNumeric        50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07
>>>> 3.1e+05 4.0e+02  1  1  2  4 11   1  1  2  4 11 235011
>>>> >
>>>> > New implementation of the all-at-once algorithm:
>>>> >
>>>> > MatPtAP               50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08
>>>> 1.4e+05 6.0e+02  4  0  7  7 17   4  0  7  7 17     0
>>>> > MatPtAPSymbolic       50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07
>>>> 1.2e+05 2.0e+02  2  0  5  4  6   2  0  5  4  6     0
>>>> > MatPtAPNumeric        50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07
>>>> 2.0e+05 4.0e+02  2  0  2  3 11   2  0  2  3 11     0
>>>> >
>>>> >
>>>> > You can see here the all-at-once is a bit slower than ptap-scalable,
>>>> but it uses only much less memory.
>>>> >
>>>> >
>>>> > Fande
>>>> >
>>>>
>>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190415/778e36ed/attachment.html>


More information about the petsc-dev mailing list