[petsc-users] Bad memory scaling with PETSc 3.10
Fande Kong
fdkong.jd at gmail.com
Tue Apr 30 10:00:56 CDT 2019
HI Myriam,
We are interesting how the new algorithms perform. So there are two new
algorithms you could try.
Algorithm 1:
-matptap_via allatonce -mat_freeintermediatedatastructures 1
Algorithm 2:
-matptap_via allatonce_merged -mat_freeintermediatedatastructures 1
Note that you need to use the current petsc-master, and also please put
"-snes_view" in your script so that we can confirm these options are
actually get set.
Thanks,
Fande,
On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users <
petsc-users at mcs.anl.gov> wrote:
> Hi,
>
> that's really good news for us, thanks! I will plot again the memory
> scaling using these new options and let you know. Next week I hope.
>
> Before that, I just need to clarify the situation. Throughout our
> discussions, we mentionned a number of options concerning the scalability:
>
> -matptatp_via scalable
> -inner_diag_matmatmult_via scalable
> -inner_diag_matmatmult_via scalable
> -mat_freeintermediatedatastructures
> -matptap_via allatonce
> -matptap_via allatonce_merged
>
> Which ones of them are compatible? Should I use all of them at the same
> time? Is there redundancy?
>
> Thanks,
>
> Myriam
>
> Le 04/25/19 à 21:47, Zhang, Hong a écrit :
>
> Myriam:
> Checking MatPtAP() in petsc-3.6.4, I realized that it uses different
> algorithm than petsc-10 and later versions. petsc-3.6 uses out-product for
> C=P^T * AP, while petsc-3.10 uses local transpose of P. petsc-3.10
> accelerates data accessing, but doubles the memory of P.
>
> Fande added two new implementations for MatPtAP() to petsc-master which
> use much smaller and scalable memories with slightly higher computing time
> (faster than hypre though). You may use these new implementations if you
> have concern on memory scalability. The option for these new implementation
> are:
> -matptap_via allatonce
> -matptap_via allatonce_merged
>
> Hong
>
> On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov <hzhang at mcs.anl.gov>
> wrote:
>
>> Myriam:
>> Thank you very much for providing these results!
>> I have put effort to accelerate execution time and avoid using global
>> sizes in PtAP, for which the algorithm of transpose of P_local and P_other
>> likely doubles the memory usage. I'll try to investigate why it becomes
>> unscalable.
>> Hong
>>
>>> Hi,
>>>
>>> you'll find the new scaling attached (green line). I used the version
>>> 3.11 and the four scalability options :
>>> -matptap_via scalable
>>> -inner_diag_matmatmult_via scalable
>>> -inner_offdiag_matmatmult_via scalable
>>> -mat_freeintermediatedatastructures
>>>
>>> The scaling is much better! The code even uses less memory for the
>>> smallest cases. There is still an increase for the larger one.
>>>
>>> With regard to the time scaling, I used KSPView and LogView on the two
>>> previous scalings (blue and yellow lines) but not on the last one (green
>>> line). So we can't really compare them, am I right? However, we can see
>>> that the new time scaling looks quite good. It slightly increases from ~8s
>>> to ~27s.
>>>
>>> Unfortunately, the computations are expensive so I would like to avoid
>>> re-run them if possible. How relevant would be a proper time scaling for
>>> you?
>>>
>>> Myriam
>>>
>>> Le 04/12/19 à 18:18, Zhang, Hong a écrit :
>>>
>>> Myriam :
>>> Thanks for your effort. It will help us improve PETSc.
>>> Hong
>>>
>>> Hi all,
>>>>
>>>> I used the wrong script, that's why it diverged... Sorry about that.
>>>> I tried again with the right script applied on a tiny problem (~200
>>>> elements). I can see a small difference in memory usage (gain ~ 1mB).
>>>> when adding the -mat_freeintermediatestructures option. I still have to
>>>> execute larger cases to plot the scaling. The supercomputer I am used to
>>>> run my jobs on is really busy at the moment so it takes a while. I hope
>>>> I'll send you the results on Monday.
>>>>
>>>> Thanks everyone,
>>>>
>>>> Myriam
>>>>
>>>>
>>>> Le 04/11/19 à 06:01, Jed Brown a écrit :
>>>> > "Zhang, Hong" <hzhang at mcs.anl.gov> writes:
>>>> >
>>>> >> Jed:
>>>> >>>> Myriam,
>>>> >>>> Thanks for the plot. '-mat_freeintermediatedatastructures' should
>>>> not affect solution. It releases almost half of memory in C=PtAP if C is
>>>> not reused.
>>>> >>> And yet if turning it on causes divergence, that would imply a bug.
>>>> >>> Hong, are you able to reproduce the experiment to see the memory
>>>> >>> scaling?
>>>> >> I like to test his code using an alcf machine, but my hands are full
>>>> now. I'll try it as soon as I find time, hopefully next week.
>>>> > I have now compiled and run her code locally.
>>>> >
>>>> > Myriam, thanks for your last mail adding configuration and removing
>>>> the
>>>> > MemManager.h dependency. I ran with and without
>>>> > -mat_freeintermediatedatastructures and don't see a difference in
>>>> > convergence. What commands did you run to observe that difference?
>>>>
>>>> --
>>>> Myriam Peyrounette
>>>> CNRS/IDRIS - HLST
>>>> --
>>>>
>>>>
>>>>
>>> --
>>> Myriam Peyrounette
>>> CNRS/IDRIS - HLST
>>> --
>>>
>>>
> --
> Myriam Peyrounette
> CNRS/IDRIS - HLST
> --
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190430/e99a7d77/attachment.html>
More information about the petsc-users
mailing list