<div dir="ltr"><div dir="ltr"><div><br clear="all"></div><div>Hi, Chris,</div><div>  I think I am done with the MR, <a href="https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp3catWNW$">https://gitlab.com/petsc/petsc/-/merge_requests/7651</a></div><div>  You can look at the sample output there.  The array size is now very large,  supporting an aggregated L3 cache size of 1,920MB. </div><div><br></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Oct 21, 2025 at 6:17 AM Klaij, Christiaan <<a href="mailto:C.Klaij@marin.nl">C.Klaij@marin.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">OK, experiments will have to wait till we get the hardware.<br>
<br>
Can you give me a sign when you are done with the merge request? I<br>
would like to try with the increased array size, other vendors<br>
already warned me that "the array in stream is quiet small".<br>
<br>
Chris<br>
<br>
________________________________________<br>
From: Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
Sent: Monday, October 20, 2025 6:36 PM<br>
To: Klaij, Christiaan<br>
Cc: PETSc users list<br>
Subject: Re: [petsc-users] interpreting petsc streams result<br>
<br>
Hi, Chris,<br>
  Since we compute the speed up off the bandwidth achieved by a single MPI process, and a process can drive all memory channels,  the maximum speed up can only come from experiments (vs. not by # of memory channels).<br>
<br>
  --Junchao Zhang<br>
<br>
<br>
On Mon, Oct 20, 2025 at 9:45 AM Klaij, Christiaan <<a href="mailto:C.Klaij@marin.nl" target="_blank">C.Klaij@marin.nl</a><mailto:<a href="mailto:C.Klaij@marin.nl" target="_blank">C.Klaij@marin.nl</a>>> wrote:<br>
Hi Junchao,<br>
<br>
Thanks for you answer. Regarding the speed-up what would you expect if not 24 out of 64, and why?<br>
<br>
Chris<br>
<br>
________________________________________<br>
[cid:ii_19a027041d3d825dd561]<br>
<br>
dr. ir.         Christiaan       Klaij   |      senior researcher<br>
Research & Development   |      CFD Development<br>
T +31 317 49 33 44<tel:+31%20317%2049%2033%2044>         |      <a href="https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$" rel="noreferrer" target="_blank">www.marin.nl</a><<a href="https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp6Qf0VQG$" rel="noreferrer" target="_blank">https://www.marin.nl/</a>><br>
[Facebook]<<a href="https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp0gsxdAC$" rel="noreferrer" target="_blank">https://www.facebook.com/marin.wageningen</a>><br>
[LinkedIn]<<a href="https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYpwdttGUe$" rel="noreferrer" target="_blank">https://www.linkedin.com/company/marin</a>><br>
[YouTube]<<a href="https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp_4oXT7O$" rel="noreferrer" target="_blank">https://www.youtube.com/marinmultimedia</a>><br>
<br>
From: Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a><mailto:<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>>><br>
Sent: Friday, October 17, 2025 5:01 PM<br>
To: Klaij, Christiaan<br>
Cc: PETSc users list<br>
Subject: Re: [petsc-users] interpreting petsc streams result<br>
<br>
Hi, Chris,<br>
I did have an MR <a href="https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7651__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp3catWNW$" rel="noreferrer" target="_blank">https://gitlab.com/petsc/petsc/-/merge_requests/7651</a> to improve mpistream. I should rework it after Barry's !6903. See my inlined comments to your questions<br>
<br>
On Fri, Oct 17, 2025 at 3:37 AM Klaij, Christiaan via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><mailto:<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><mailto:<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a><mailto:<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>>> wrote:<br>
Attached is a petsc streams result kindly provided by a hardware<br>
vendor for a single compute node, dual socket, with two AMD epyc<br>
9355 processors. Each processor has 32 cores, 12 DDR5 memory<br>
channels and mem BW around 600 GB/s.<br>
<br>
* It is not immediately clear which line corresponds to which<br>
y-axis. Could future versions of petsc please color the axis<br>
label with the matching line color?<br>
definitely<br>
<br>
<br>
* Why would the achieved bandwidth be roughly 0.9 x 1e6 MB/s =<br>
900 GB/s and not closer to 1200 GB/s?<br>
I recall it is actually not simple to get the theoretical max bandwidth. One has to use special SIMD instructions, compiler flags and streaming stores etc.<br>
<br>
<br>
* The speed-up seems to be 12 out of 64, provided multiples of 8<br>
cores are used. As expected given 12 memory channels?<br>
Maybe not, otherwise the speedup should be 24 as you have 24 channels.<br>
<br>
<br>
* Does the zig-zag pattern indicate a pinning problem, or is it<br>
unavoidable given the 8 core building block of these type of<br>
processors?<br>
I checked and found "make mpistream" uses --map-by core. I think we should use --map-by socket or --map-by l3cache.<br>
<br>
<br>
Chris<br>
[cid:ii_199f2a38566119b24a61]<br>
dr. ir. Christiaan Klaij | senior researcher<br>
Research & Development | CFD Development<br>
T +31 317 49 33 44<tel:+31%20317%2049%2033%2044> | <a href="https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$" rel="noreferrer" target="_blank">www.marin.nl</a><<a href="https://urldefense.us/v3/__http://www.marin.nl__;!!G_uCfscf7eWS!YrTW8A4OcJU-ZdMQHgpISTnTkIOdDgvE9JdeWugUHYhynmyVAiRsbC2alT9pMknGJMkb559Bgu3olNqbiDcYp84_t49H$" rel="noreferrer" target="_blank">http://www.marin.nl</a>><<a href="https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsBfp_z4A$" rel="noreferrer" target="_blank">https://urldefense.us/v3/__https://www.marin.nl/__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsBfp_z4A$</a>><br>
[Facebook]<<a href="https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsCH7BGfA$" rel="noreferrer" target="_blank">https://urldefense.us/v3/__https://www.facebook.com/marin.wageningen__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsCH7BGfA$</a>><br>
[LinkedIn]<<a href="https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsDAV2fAI$" rel="noreferrer" target="_blank">https://urldefense.us/v3/__https://www.linkedin.com/company/marin__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsDAV2fAI$</a>><br>
[YouTube]<<a href="https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsEyu_yEs$" rel="noreferrer" target="_blank">https://urldefense.us/v3/__https://www.youtube.com/marinmultimedia__;!!G_uCfscf7eWS!fqSBpN3Ld5fjzXGShGI09uJke12M-5LukEHe-y-gw0Bw9msZeH7wNiId6DZxQpluR_RUWpuoQWUD2HSsEyu_yEs$</a>><br>
</blockquote></div></div>