<div dir="ltr"><div class="gmail_default" style="font-family:comic sans ms,sans-serif;color:#38761d">On my list of things to do in PIO is rewriting the error handling code - but that issue is the same for both cases and</div><div class="gmail_default" style="font-family:comic sans ms,sans-serif;color:#38761d">so I don't think it would play a role in the difference we are seeing.<br></div><div class="gmail_default" style="font-family:comic sans ms,sans-serif;color:#38761d"><br></div><div class="gmail_default" style="font-family:comic sans ms,sans-serif;color:#38761d">The lfs getstripe output of the two files is nearly identical, I show only the cesm file here.</div><div class="gmail_default" style="font-family:comic sans ms,sans-serif;color:#38761d"><br></div><div class="gmail_default" style="font-family:comic sans ms,sans-serif;color:#38761d">  lcm_layout_gen:    7<br>  lcm_mirror_count:  1<br>  lcm_entry_count:   4<br>    lcme_id:             1<br>    lcme_mirror_id:      0<br>    lcme_flags:          init<br>    lcme_extent.e_start: 0<br>    lcme_extent.e_end:   16777216<br>      lmm_stripe_count:  1<br>      lmm_stripe_size:   1048576<br>      lmm_pattern:       raid0<br>      lmm_layout_gen:    0<br>      lmm_stripe_offset: 27<br>      lmm_objects:<br>      - 0: { l_ost_idx: 27, l_fid: [0xa80000401:0x3f2ef8:0x0] }<br><br>    lcme_id:             2<br>    lcme_mirror_id:      0<br>    lcme_flags:          init<br>    lcme_extent.e_start: 16777216<br>    lcme_extent.e_end:   17179869184<br>      lmm_stripe_count:  4<br>      lmm_stripe_size:   16777216<br>      lmm_pattern:       raid0<br>      lmm_layout_gen:    0<br>      lmm_stripe_offset: 35<br>      lmm_objects:<br>      - 0: { l_ost_idx: 35, l_fid: [0xc80000401:0x3f32e5:0x0] }<br>      - 1: { l_ost_idx: 39, l_fid: [0xcc0000402:0x3f3162:0x0] }<br>      - 2: { l_ost_idx: 43, l_fid: [0xe80000402:0x3f2f3a:0x0] }<br>      - 3: { l_ost_idx: 47, l_fid: [0xec0000401:0x3f3017:0x0] }<br><br>    lcme_id:             3<br>    lcme_mirror_id:      0<br>    lcme_flags:          init<br>    lcme_extent.e_start: 17179869184<br>    lcme_extent.e_end:   68719476736<br>      lmm_stripe_count:  12<br>      lmm_stripe_size:   16777216<br>      lmm_pattern:       raid0<br>      lmm_layout_gen:    0<br>      lmm_stripe_offset: 16<br>      lmm_objects:<br>      - 0: { l_ost_idx: 16, l_fid: [0x700000402:0x4021eb:0x0] }<br>      - 1: { l_ost_idx: 20, l_fid: [0x740000402:0x4020ae:0x0] }<br>      - 2: { l_ost_idx: 24, l_fid: [0x900000400:0x401f68:0x0] }<br>      - 3: { l_ost_idx: 28, l_fid: [0x940000400:0x401f71:0x0] }<br>      - 4: { l_ost_idx: 32, l_fid: [0xb00000402:0x40220c:0x0] }<br>      - 5: { l_ost_idx: 36, l_fid: [0xb40000402:0x40210b:0x0] }<br>      - 6: { l_ost_idx: 40, l_fid: [0xd00000402:0x402141:0x0] }<br>      - 7: { l_ost_idx: 44, l_fid: [0xd40000400:0x401e90:0x0] }<br>      - 8: { l_ost_idx: 48, l_fid: [0xf00000401:0x401e08:0x0] }<br>      - 9: { l_ost_idx: 52, l_fid: [0xf40000400:0x401e32:0x0] }<br>      - 10: { l_ost_idx: 56, l_fid: [0x1100000402:0x4022e0:0x0] }<br>      - 11: { l_ost_idx: 60, l_fid: [0x1140000402:0x4020a6:0x0] }<br><br>    lcme_id:             4<br>    lcme_mirror_id:      0<br>    lcme_flags:          init<br>    lcme_extent.e_start: 68719476736<br>    lcme_extent.e_end:   EOF<br>      lmm_stripe_count:  24<br>      lmm_stripe_size:   16777216<br>      lmm_pattern:       raid0<br>      lmm_layout_gen:    0<br>      lmm_stripe_offset: 51<br>      lmm_objects:<br>      - 0: { l_ost_idx: 51, l_fid: [0x1080000400:0x402881:0x0] }<br>      - 1: { l_ost_idx: 55, l_fid: [0x10c0000400:0x402955:0x0] }<br>      - 2: { l_ost_idx: 59, l_fid: [0x1280000400:0x4027d6:0x0] }<br>      - 3: { l_ost_idx: 63, l_fid: [0x12c0000401:0x402ab2:0x0] }<br>      - 4: { l_ost_idx: 67, l_fid: [0x1480000400:0x402b75:0x0] }<br>      - 5: { l_ost_idx: 71, l_fid: [0x14c0000400:0x4028d2:0x0] }<br>      - 6: { l_ost_idx: 75, l_fid: [0x1680000401:0x402a3b:0x0] }<br>      - 7: { l_ost_idx: 79, l_fid: [0x16c0000402:0x40294d:0x0] }<br>      - 8: { l_ost_idx: 83, l_fid: [0x1880000401:0x40299c:0x0] }<br>      - 9: { l_ost_idx: 87, l_fid: [0x18c0000402:0x402f5e:0x0] }<br>      - 10: { l_ost_idx: 91, l_fid: [0x1a80000400:0x402a16:0x0] }<br>      - 11: { l_ost_idx: 95, l_fid: [0x1ac0000400:0x402bd2:0x0] }<br>      - 12: { l_ost_idx: 0, l_fid: [0x300000401:0x402a2e:0x0] }<br>      - 13: { l_ost_idx: 4, l_fid: [0x340000402:0x4027d2:0x0] }<br>      - 14: { l_ost_idx: 8, l_fid: [0x500000402:0x402a26:0x0] }<br>      - 15: { l_ost_idx: 12, l_fid: [0x540000400:0x402943:0x0] }<br>      - 16: { l_ost_idx: 64, l_fid: [0x1300000402:0x402c10:0x0] }<br>      - 17: { l_ost_idx: 68, l_fid: [0x1340000401:0x4029c6:0x0] }<br>      - 18: { l_ost_idx: 72, l_fid: [0x1500000402:0x402d11:0x0] }<br>      - 19: { l_ost_idx: 76, l_fid: [0x1540000402:0x402be2:0x0] }<br>      - 20: { l_ost_idx: 80, l_fid: [0x1700000400:0x402a64:0x0] }<br>      - 21: { l_ost_idx: 84, l_fid: [0x1740000401:0x402b11:0x0] }<br>      - 22: { l_ost_idx: 88, l_fid: [0x1900000402:0x402cb8:0x0] }<br>      - 23: { l_ost_idx: 92, l_fid: [0x1940000400:0x402cd3:0x0] }<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Aug 4, 2023 at 12:12 PM Wei-Keng Liao <<a href="mailto:wkliao@northwestern.edu">wkliao@northwestern.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">



<div>
<br>
<div>I can see the file header size of 20620 bytes. Because all attributes are stored</div>
<div>in the header, the cost of writing them should not be an issue. I also see no gap</div>
<div>between 2 consecutive variables, which is good, meaning the write requests made</div>
<div>by MPI-IO will be contiguous.
<div><br>
</div>
<div>If the call sequence of PnetCDF APIs is the same between pioperf and cesm, then</div>
<div>the performance should be similarly. Can you check the Lustre striping settings</div>
<div>of the 2 output files, using command "lfs getstripe"?</div>
<div><br>
</div>
<div>If you set any MPI-IO hints, they can also play a role in performance.</div>
<div>See the example in PnetCDF for how to dump all hints (function print_info().)</div>
<div><a href="https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/get_info.c" target="_blank">https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/get_info.c</a></div>
<div><br>
<div>If all the above checked out right, then using Darshan should reveal more information.</div>
<div><br>
</div>
<div>
<div style="color:rgb(0,0,0)">BTW, what PnetCDF version is being used?</div>
<div style="color:rgb(0,0,0)"><br>
</div>
</div>
<div>A comment about PIOc_put_att_tc.</div>
<div>* calling MPI_Bcast for checking the error code may not be necessary. PnetCDF does such</div>
<div>  check and all metadata consistency check at ncmpi_enddef. If the number of variables</div>
<div>  and their attributes are high, then calling lots of MPI_Bcast can be expensive.</div>
<div>
<div><a href="https://github.com/NCAR/ParallelIO/blob/f45ba898bec31e6cd662ac41f43e0cff14f928b2/src/clib/pio_getput_int.c#L213" target="_blank">https://github.com/NCAR/ParallelIO/blob/f45ba898bec31e6cd662ac41f43e0cff14f928b2/src/clib/pio_getput_int.c#L213</a></div>
<div><br>
</div>
<div><br>
</div>
</div>
<div>Wei-keng</div>
<div><br>
<blockquote type="cite">
<div>On Aug 4, 2023, at 12:32 PM, Jim Edwards <<a href="mailto:jedwards@ucar.edu" target="_blank">jedwards@ucar.edu</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
Yes, _enddef is called only once. <br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<a href="https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/main/src/clib/pio_getput_int.c*L128__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hw4F6HzDo$" target="_blank">Here</a> is
 the code that writes attributes. <a href="https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/main/src/clib/pio_darray_int.c*L661__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwOC-tyLY$" target="_blank"> Here</a> is
 where variables are written.  <br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
ncoffsets -sg <a href="https://urldefense.com/v3/__http://pioperf.2-0256-1.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwTlZDyu8$" target="_blank">pioperf.2-0256-1.nc</a><br>
netcdf <a href="https://urldefense.com/v3/__http://pioperf.2-0256-1.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwTlZDyu8$" target="_blank">pioperf.2-0256-1.nc</a> {<br>
// file format: CDF-5<br>
<br>
file header:<br>
size   = 7804 bytes<br>
extent = 8192 bytes<br>
<br>
dimensions:<br>
dim000001 = 10485762<br>
dim000002 = 58<br>
time = UNLIMITED // (1 currently)<br>
<br>
record variables:<br>
double vard0001(time, dim000002, dim000001):<br>
      start file offset =        8192    (0th record)<br>
      end   file offset =  4865401760    (0th record)<br>
      size in bytes     =  4865393568    (of one record)<br>
      gap from prev var =         388<br>
double vard0002(time, dim000002, dim000001):<br>
      start file offset =  4865401760    (0th record)<br>
      end   file offset =  9730795328    (0th record)<br>
      size in bytes     =  4865393568    (of one record)<br>
      gap from prev var =           0<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
snip</div>
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
double vard0064(time, dim000002, dim000001):<br>
      start file offset =306519802976    (0th record)<br>
      end   file offset =311385196544    (0th record)<br>
      size in bytes     =  4865393568    (of one record)<br>
      gap from prev var =           0<br>
}</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
 ncoffsets -sg run/<a href="https://urldefense.com/v3/__http://SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwdkm0FHQ$" target="_blank">SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc</a><br>
netcdf run/<a href="https://urldefense.com/v3/__http://SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwdkm0FHQ$" target="_blank">SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc</a> {<br>
// file format: CDF-5<br>
<br>
file header:<br>
size   = 20620 bytes<br>
extent = 16777216 bytes<br>
<br>
dimensions:<br>
ncol = 10485762<br>
time = UNLIMITED // (1 currently)<br>
nbnd = 2<br>
chars = 8<br>
lev = 58<br>
ilev = 59<br>
<br>
fixed-size variables:<br>
double lat(ncol):<br>
      start file offset =    16777216<br>
      end   file offset =   100663312<br>
      size in bytes     =    83886096<br>
      gap from prev var =    16756596<br>
double lon(ncol):<br>
      start file offset =   100663312<br>
      end   file offset =   184549408<br>
      size in bytes     =    83886096<br>
      gap from prev var =           0<br>
</div>
</div>
</blockquote>
<div><br>
</div>
snip</div>
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
int    mdt:<br>
      start file offset =   352322552<br>
      end   file offset =   352322556<br>
      size in bytes     =           4<br>
      gap from prev var =           0<br>
<br>
record variables:<br>
double time(time):<br>
      start file offset =   352322556    (0th record)<br>
      end   file offset =   352322564    (0th record)<br>
      size in bytes     =           8    (of one record)<br>
      gap from prev var =           0<br>
int    date(time):<br>
      start file offset =   352322564    (0th record)<br>
      end   file offset =   352322568    (0th record)<br>
      size in bytes     =           4    (of one record)<br>
      gap from prev var =           0<br>
</div>
</div>
</blockquote>
<div><br>
</div>
snip</div>
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
double STEND_CLUBB(time, lev, ncol):<br>
      start file offset =306872117448    (0th record)<br>
      end   file offset =311737511016    (0th record)<br>
      size in bytes     =  4865393568    (of one record)<br>
      gap from prev var =           0<br>
}<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Aug 4, 2023 at 10:35 AM Wei-Keng Liao <<a href="mailto:wkliao@northwestern.edu" target="_blank">wkliao@northwestern.edu</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Can you run command "ncoffsets -sg <a href="https://urldefense.com/v3/__http://file.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwCtAtD9k$" target="_blank">file.nc</a>" that shows the sizes
 of file header</div>
<div>and all variables? For the cesm case, is _enddef called only once?</div>
<div><br>
</div>
<div>Could you also point me to the program files that call PnetCDF APIs, including</div>
<div>writing attributes and variables?</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>Wei-keng </div>
<div><br>
<blockquote type="cite">
<div>On Aug 4, 2023, at 11:05 AM, Jim Edwards <<a href="mailto:jedwards@ucar.edu" target="_blank">jedwards@ucar.edu</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
I am using the new ncar system, <a href="https://urldefense.com/v3/__https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwigtcytrsOAAxWXHjQIHVoDD6YQFnoECBcQAQ&url=https*3A*2F*2Farc.ucar.edu*2Fknowledge_base*2F74317833&usg=AOvVaw2aXlWuOfLnua7fFmIgvfoV&opi=89978449__;JSUlJSU!!Dq0X2DkFhyF93HkjWTBQKhk!Xq6u5krREolkIRHG8AL2taDCmg6HsEdgcEoviUVyzqUINi-ipPM1EhtMcJkQfUYghDhutn7DfH5Wjm57wJ9lQhc$" target="_blank">derecho</a>,
 which has a lustre parallel file system.</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
Looking at the difference between the two headers below makes me wonder if the issue is with variable attributes?<br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
snip</div>
<div><br>
<blockquote type="cite"><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Aug 4, 2023 at 9:39 AM Wei-Keng Liao <<a href="mailto:wkliao@northwestern.edu" target="_blank">wkliao@northwestern.edu</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi, Jim
<div><br>
</div>
<div>Can your provide the test program and the file header dumped by "ncdump -h", if that is available?</div>
<div>Also, what machine was used in the tests and its the parallel file system configuration is?</div>
<div>These can help diagnose.</div>
<div><br>
</div>
<div>
<div>Wei-keng </div>
<div><br>
<blockquote type="cite">
<div>On Aug 4, 2023, at 8:49 AM, Jim Edwards <<a href="mailto:jedwards@ucar.edu" target="_blank">jedwards@ucar.edu</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
I am using ncmpi_iput_varn and ncmpi_wait_all to write output from my model.   I have a test program that does nothing but test the <br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
performance of the write operation.   Attached is a plot of performance in the model and in the standalone application.   I'm looking for <br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
clues as to why the model performance is scaling so badly with the number of variables but the standalone program performance is fine. <br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br>
</div>
<div class="gmail_default" style="font-family:"comic sans ms",sans-serif;color:rgb(56,118,29)">
<br clear="all">
</div>
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div>
<div>Jim Edwards<br>
<br>
</div>
<font size="1">CESM Software Engineer<br>
</font></div>
<font size="1">National Center for Atmospheric Research<br>
</font></div>
<font size="1">Boulder, CO</font> <br>
</div>
</div>
</div>
<span id="m_8242488052351017746m_-8733781311756881360m_4857238735649148917m_2599353008599558804m_3752102915513421155m_3416875553179895269m_1419731810550872517m_-4661512969452605678cid:f_lkwn4kse0"><Screenshot 2023-07-27 at 11.49.03 AM.png></span></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div>
<div>Jim Edwards<br>
<br>
</div>
<font size="1">CESM Software Engineer<br>
</font></div>
<font size="1">National Center for Atmospheric Research<br>
</font></div>
<font size="1">Boulder, CO</font> <br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div>
<div>Jim Edwards<br>
<br>
</div>
<font size="1">CESM Software Engineer<br>
</font></div>
<font size="1">National Center for Atmospheric Research<br>
</font></div>
<font size="1">Boulder, CO</font> <br>
</div>
</div>
</blockquote>
</div>
</div>
<br style="color:rgb(0,0,0)">
</div>
</div>

</blockquote></div><br clear="all"><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div><div>Jim Edwards<br><br></div><font size="1">CESM Software Engineer<br></font></div><font size="1">National Center for Atmospheric Research<br></font></div><font size="1">Boulder, CO</font> <br></div></div>