[petsc-users] Bad memory scaling with PETSc 3.10

Myriam Peyrounette myriam.peyrounette at idris.fr
Fri May 3 09:14:23 CDT 2019


And the attached files... Sorry


Le 05/03/19 à 16:11, Myriam Peyrounette a écrit :
>
> Hi,
>
> I plotted new scalings (memory and time) using the new algorithms. I
> used the options /-options_left true /to make sure that the options
> are effectively used. They are.
>
> I don't have access to the platform I used to run my computations on,
> so I ran them on a different one. In particular, I can't reach problem
> size = 1e8 and the values might be different from the previous
> scalings I sent you. But the comparison of the PETSc versions and
> options is still relevant.
>
> I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in
> green, the "bad" one (PETSc 3.10.2) in blue.
>
> I used the commit d330a26 (3.11.1) for all the other scalings, adding
> different sets of options:
>
> /Light blue/ -> -matptap_via
> allatonce  -mat_freeintermediatedatastructures 1
> /Orange/ -> -matptap_via
> allatonce_*merged* -mat_freeintermediatedatastructures 1
> /Purple/ -> -matptap_via
> allatonce  -mat_freeintermediatedatastructures 1
> *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via
> scalable*
> /Yellow/: -matptap_via
> allatonce_*merged* -mat_freeintermediatedatastructures 1
> *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via
> scalable*
>
> Conclusion: with regard to memory, the two algorithms imply a
> similarly good improvement of the scaling. The use of the
> -inner_(off)diag_matmatmult_via options is also very interesting. The
> scaling is still not as good as 3.6.4 though.
> With regard to time, I noted a real improvement in time execution! I
> used to spend 200-300s on these executions. Now they take 10-15s.
> Beside that, the "_merged" versions are more efficient. And the
> -inner_(off)diaf_matmatmult_via options are slightly expensive but it
> is not critical.
>
> What do you think? Is it possible to match again the scaling of PETSc
> 3.6.4? Is it worthy keeping investigating?
>
> Myriam
>
>
> Le 04/30/19 à 17:00, Fande Kong a écrit :
>> HI Myriam,
>>
>> We are interesting how the new algorithms perform. So there are two
>> new algorithms you could try.
>>
>> Algorithm 1:
>>
>> -matptap_via allatonce  -mat_freeintermediatedatastructures 1
>>
>> Algorithm 2:
>>
>> -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1
>>
>>
>> Note that you need to use the current petsc-master, and also please
>> put "-snes_view" in your script so that we can confirm these options
>> are actually get set.
>>
>> Thanks,
>>
>> Fande,
>>
>>
>> On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users
>> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>
>>     Hi,
>>
>>     that's really good news for us, thanks! I will plot again the
>>     memory scaling using these new options and let you know. Next
>>     week I hope.
>>
>>     Before that, I just need to clarify the situation. Throughout our
>>     discussions, we mentionned a number of options concerning the
>>     scalability:
>>
>>     -matptatp_via scalable
>>     -inner_diag_matmatmult_via scalable
>>     -inner_diag_matmatmult_via scalable
>>     -mat_freeintermediatedatastructures
>>     -matptap_via allatonce
>>     -matptap_via allatonce_merged
>>
>>     Which ones of them are compatible? Should I use all of them at
>>     the same time? Is there redundancy?
>>
>>     Thanks,
>>
>>     Myriam
>>
>>
>>     Le 04/25/19 à 21:47, Zhang, Hong a écrit :
>>>     Myriam:
>>>     Checking MatPtAP() in petsc-3.6.4, I realized that it uses
>>>     different algorithm than petsc-10 and later versions. petsc-3.6
>>>     uses out-product for C=P^T * AP, while petsc-3.10 uses local
>>>     transpose of P. petsc-3.10 accelerates data accessing, but
>>>     doubles the memory of P. 
>>>
>>>     Fande added two new implementations for MatPtAP() to
>>>     petsc-master which use much smaller and scalable memories with
>>>     slightly higher computing time (faster than hypre though). You
>>>     may use these new implementations if you have concern on memory
>>>     scalability. The option for these new implementation are: 
>>>     -matptap_via allatonce
>>>     -matptap_via allatonce_merged
>>>
>>>     Hong
>>>
>>>     On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov
>>>     <mailto:hzhang at mcs.anl.gov> <hzhang at mcs.anl.gov
>>>     <mailto:hzhang at mcs.anl.gov>> wrote:
>>>
>>>         Myriam:
>>>         Thank you very much for providing these results!
>>>         I have put effort to accelerate execution time and avoid
>>>         using global sizes in PtAP, for which the algorithm of
>>>         transpose of P_local and P_other likely doubles the memory
>>>         usage. I'll try to investigate why it becomes unscalable.
>>>         Hong
>>>
>>>             Hi,
>>>
>>>             you'll find the new scaling attached (green line). I
>>>             used the version 3.11 and the four scalability options :
>>>             -matptap_via scalable
>>>             -inner_diag_matmatmult_via scalable
>>>             -inner_offdiag_matmatmult_via scalable
>>>             -mat_freeintermediatedatastructures
>>>
>>>             The scaling is much better! The code even uses less
>>>             memory for the smallest cases. There is still an
>>>             increase for the larger one.
>>>
>>>             With regard to the time scaling, I used KSPView and
>>>             LogView on the two previous scalings (blue and yellow
>>>             lines) but not on the last one (green line). So we can't
>>>             really compare them, am I right? However, we can see
>>>             that the new time scaling looks quite good. It slightly
>>>             increases from ~8s to ~27s.
>>>
>>>             Unfortunately, the computations are expensive so I would
>>>             like to avoid re-run them if possible. How relevant
>>>             would be a proper time scaling for you? 
>>>
>>>             Myriam
>>>
>>>
>>>             Le 04/12/19 à 18:18, Zhang, Hong a écrit :
>>>>             Myriam :
>>>>             Thanks for your effort. It will help us improve PETSc.
>>>>             Hong
>>>>
>>>>                 Hi all,
>>>>
>>>>                 I used the wrong script, that's why it diverged...
>>>>                 Sorry about that. 
>>>>                 I tried again with the right script applied on a
>>>>                 tiny problem (~200
>>>>                 elements). I can see a small difference in memory
>>>>                 usage (gain ~ 1mB).
>>>>                 when adding the -mat_freeintermediatestructures
>>>>                 option. I still have to
>>>>                 execute larger cases to plot the scaling. The
>>>>                 supercomputer I am used to
>>>>                 run my jobs on is really busy at the moment so it
>>>>                 takes a while. I hope
>>>>                 I'll send you the results on Monday.
>>>>
>>>>                 Thanks everyone,
>>>>
>>>>                 Myriam
>>>>
>>>>
>>>>                 Le 04/11/19 à 06:01, Jed Brown a écrit :
>>>>                 > "Zhang, Hong" <hzhang at mcs.anl.gov
>>>>                 <mailto:hzhang at mcs.anl.gov>> writes:
>>>>                 >
>>>>                 >> Jed:
>>>>                 >>>> Myriam,
>>>>                 >>>> Thanks for the plot.
>>>>                 '-mat_freeintermediatedatastructures' should not
>>>>                 affect solution. It releases almost half of memory
>>>>                 in C=PtAP if C is not reused.
>>>>                 >>> And yet if turning it on causes divergence,
>>>>                 that would imply a bug.
>>>>                 >>> Hong, are you able to reproduce the experiment
>>>>                 to see the memory
>>>>                 >>> scaling?
>>>>                 >> I like to test his code using an alcf machine,
>>>>                 but my hands are full now. I'll try it as soon as I
>>>>                 find time, hopefully next week.
>>>>                 > I have now compiled and run her code locally.
>>>>                 >
>>>>                 > Myriam, thanks for your last mail adding
>>>>                 configuration and removing the
>>>>                 > MemManager.h dependency.  I ran with and without
>>>>                 > -mat_freeintermediatedatastructures and don't see
>>>>                 a difference in
>>>>                 > convergence.  What commands did you run to
>>>>                 observe that difference?
>>>>
>>>>                 -- 
>>>>                 Myriam Peyrounette
>>>>                 CNRS/IDRIS - HLST
>>>>                 --
>>>>
>>>>
>>>
>>>             -- 
>>>             Myriam Peyrounette
>>>             CNRS/IDRIS - HLST
>>>             --
>>>
>>
>>     -- 
>>     Myriam Peyrounette
>>     CNRS/IDRIS - HLST
>>     --
>>
>
> -- 
> Myriam Peyrounette
> CNRS/IDRIS - HLST
> --

-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190503/1514baf4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex42_mem_scaling_ada.png
Type: image/png
Size: 48984 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190503/1514baf4/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex42_time_scaling_ada.png
Type: image/png
Size: 36796 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190503/1514baf4/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2975 bytes
Desc: Signature cryptographique S/MIME
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190503/1514baf4/attachment-0001.p7s>


More information about the petsc-users mailing list