performance issue

Wei-Keng Liao wkliao at northwestern.edu
Fri Aug 4 14:28:00 CDT 2023


Hi, Jim

Could you print all the MPI-IO hints for using the Lustre progressive file layout?
That may reveal some information.

Wei-keng

On Aug 4, 2023, at 1:45 PM, Wei-Keng Liao <wkliao at northwestern.edu> wrote:

Looks like the cesm file is using "Lustre progressive file layout", which is
a new striping strategy. My guess is it is used center-widely by default.
Rob Latham at ANL has more experiences on this feature. He may have some suggestions.

In the meantime, can you write to a new folder and explicitly set its
Lustre striping count to a larger number, as the total write amount is more
than 300GB?  This old fashion setting may give a consistent timing.


Wei-keng

On Aug 4, 2023, at 1:27 PM, Jim Edwards <jedwards at ucar.edu> wrote:

oh and the pnetcdf version is 1.12.3

On Fri, Aug 4, 2023 at 12:25 PM Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
On my list of things to do in PIO is rewriting the error handling code - but that issue is the same for both cases and
so I don't think it would play a role in the difference we are seeing.

The lfs getstripe output of the two files is nearly identical, I show only the cesm file here.

  lcm_layout_gen:    7
  lcm_mirror_count:  1
  lcm_entry_count:   4
    lcme_id:             1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   16777216
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 27
      lmm_objects:
      - 0: { l_ost_idx: 27, l_fid: [0xa80000401:0x3f2ef8:0x0] }

    lcme_id:             2
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 16777216
    lcme_extent.e_end:   17179869184
      lmm_stripe_count:  4
      lmm_stripe_size:   16777216
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 35
      lmm_objects:
      - 0: { l_ost_idx: 35, l_fid: [0xc80000401:0x3f32e5:0x0] }
      - 1: { l_ost_idx: 39, l_fid: [0xcc0000402:0x3f3162:0x0] }
      - 2: { l_ost_idx: 43, l_fid: [0xe80000402:0x3f2f3a:0x0] }
      - 3: { l_ost_idx: 47, l_fid: [0xec0000401:0x3f3017:0x0] }

    lcme_id:             3
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 17179869184
    lcme_extent.e_end:   68719476736
      lmm_stripe_count:  12
      lmm_stripe_size:   16777216
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 16
      lmm_objects:
      - 0: { l_ost_idx: 16, l_fid: [0x700000402:0x4021eb:0x0] }
      - 1: { l_ost_idx: 20, l_fid: [0x740000402:0x4020ae:0x0] }
      - 2: { l_ost_idx: 24, l_fid: [0x900000400:0x401f68:0x0] }
      - 3: { l_ost_idx: 28, l_fid: [0x940000400:0x401f71:0x0] }
      - 4: { l_ost_idx: 32, l_fid: [0xb00000402:0x40220c:0x0] }
      - 5: { l_ost_idx: 36, l_fid: [0xb40000402:0x40210b:0x0] }
      - 6: { l_ost_idx: 40, l_fid: [0xd00000402:0x402141:0x0] }
      - 7: { l_ost_idx: 44, l_fid: [0xd40000400:0x401e90:0x0] }
      - 8: { l_ost_idx: 48, l_fid: [0xf00000401:0x401e08:0x0] }
      - 9: { l_ost_idx: 52, l_fid: [0xf40000400:0x401e32:0x0] }
      - 10: { l_ost_idx: 56, l_fid: [0x1100000402:0x4022e0:0x0] }
      - 11: { l_ost_idx: 60, l_fid: [0x1140000402:0x4020a6:0x0] }

    lcme_id:             4
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 68719476736
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  24
      lmm_stripe_size:   16777216
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 51
      lmm_objects:
      - 0: { l_ost_idx: 51, l_fid: [0x1080000400:0x402881:0x0] }
      - 1: { l_ost_idx: 55, l_fid: [0x10c0000400:0x402955:0x0] }
      - 2: { l_ost_idx: 59, l_fid: [0x1280000400:0x4027d6:0x0] }
      - 3: { l_ost_idx: 63, l_fid: [0x12c0000401:0x402ab2:0x0] }
      - 4: { l_ost_idx: 67, l_fid: [0x1480000400:0x402b75:0x0] }
      - 5: { l_ost_idx: 71, l_fid: [0x14c0000400:0x4028d2:0x0] }
      - 6: { l_ost_idx: 75, l_fid: [0x1680000401:0x402a3b:0x0] }
      - 7: { l_ost_idx: 79, l_fid: [0x16c0000402:0x40294d:0x0] }
      - 8: { l_ost_idx: 83, l_fid: [0x1880000401:0x40299c:0x0] }
      - 9: { l_ost_idx: 87, l_fid: [0x18c0000402:0x402f5e:0x0] }
      - 10: { l_ost_idx: 91, l_fid: [0x1a80000400:0x402a16:0x0] }
      - 11: { l_ost_idx: 95, l_fid: [0x1ac0000400:0x402bd2:0x0] }
      - 12: { l_ost_idx: 0, l_fid: [0x300000401:0x402a2e:0x0] }
      - 13: { l_ost_idx: 4, l_fid: [0x340000402:0x4027d2:0x0] }
      - 14: { l_ost_idx: 8, l_fid: [0x500000402:0x402a26:0x0] }
      - 15: { l_ost_idx: 12, l_fid: [0x540000400:0x402943:0x0] }
      - 16: { l_ost_idx: 64, l_fid: [0x1300000402:0x402c10:0x0] }
      - 17: { l_ost_idx: 68, l_fid: [0x1340000401:0x4029c6:0x0] }
      - 18: { l_ost_idx: 72, l_fid: [0x1500000402:0x402d11:0x0] }
      - 19: { l_ost_idx: 76, l_fid: [0x1540000402:0x402be2:0x0] }
      - 20: { l_ost_idx: 80, l_fid: [0x1700000400:0x402a64:0x0] }
      - 21: { l_ost_idx: 84, l_fid: [0x1740000401:0x402b11:0x0] }
      - 22: { l_ost_idx: 88, l_fid: [0x1900000402:0x402cb8:0x0] }
      - 23: { l_ost_idx: 92, l_fid: [0x1940000400:0x402cd3:0x0] }

On Fri, Aug 4, 2023 at 12:12 PM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:

I can see the file header size of 20620 bytes. Because all attributes are stored
in the header, the cost of writing them should not be an issue. I also see no gap
between 2 consecutive variables, which is good, meaning the write requests made
by MPI-IO will be contiguous.

If the call sequence of PnetCDF APIs is the same between pioperf and cesm, then
the performance should be similarly. Can you check the Lustre striping settings
of the 2 output files, using command "lfs getstripe"?

If you set any MPI-IO hints, they can also play a role in performance.
See the example in PnetCDF for how to dump all hints (function print_info().)
https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/get_info.c<https://urldefense.com/v3/__https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/get_info.c__;!!Dq0X2DkFhyF93HkjWTBQKhk!S2kyuECdrloQfFGddM-B1Y9QnqTEKxfduivqOXqC5UpDlBkIBRYblv--jZO9EebTthfwwDYkKtJkHqMsgmikJ3c$>

If all the above checked out right, then using Darshan should reveal more information.

BTW, what PnetCDF version is being used?

A comment about PIOc_put_att_tc.
* calling MPI_Bcast for checking the error code may not be necessary. PnetCDF does such
  check and all metadata consistency check at ncmpi_enddef. If the number of variables
  and their attributes are high, then calling lots of MPI_Bcast can be expensive.
https://github.com/NCAR/ParallelIO/blob/f45ba898bec31e6cd662ac41f43e0cff14f928b2/src/clib/pio_getput_int.c#L213<https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/f45ba898bec31e6cd662ac41f43e0cff14f928b2/src/clib/pio_getput_int.c*L213__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!S2kyuECdrloQfFGddM-B1Y9QnqTEKxfduivqOXqC5UpDlBkIBRYblv--jZO9EebTthfwwDYkKtJkHqMsyB5l4Jg$>


Wei-keng

On Aug 4, 2023, at 12:32 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:

Yes, _enddef is called only once.

Here<https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/main/src/clib/pio_getput_int.c*L128__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hw4F6HzDo$> is the code that writes attributes.  Here<https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/main/src/clib/pio_darray_int.c*L661__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwOC-tyLY$> is where variables are written.

ncoffsets -sg pioperf.2-0256-1.nc<https://urldefense.com/v3/__http://pioperf.2-0256-1.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwTlZDyu8$>
netcdf pioperf.2-0256-1.nc<https://urldefense.com/v3/__http://pioperf.2-0256-1.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwTlZDyu8$> {
// file format: CDF-5

file header:
size   = 7804 bytes
extent = 8192 bytes

dimensions:
dim000001 = 10485762
dim000002 = 58
time = UNLIMITED // (1 currently)

record variables:
double vard0001(time, dim000002, dim000001):
      start file offset =        8192    (0th record)
      end   file offset =  4865401760    (0th record)
      size in bytes     =  4865393568    (of one record)
      gap from prev var =         388
double vard0002(time, dim000002, dim000001):
      start file offset =  4865401760    (0th record)
      end   file offset =  9730795328    (0th record)
      size in bytes     =  4865393568    (of one record)
      gap from prev var =           0

snip

double vard0064(time, dim000002, dim000001):
      start file offset =306519802976    (0th record)
      end   file offset =311385196544    (0th record)
      size in bytes     =  4865393568    (of one record)
      gap from prev var =           0
}

 ncoffsets -sg run/SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc<https://urldefense.com/v3/__http://SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwdkm0FHQ$>
netcdf run/SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc<https://urldefense.com/v3/__http://SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwdkm0FHQ$> {
// file format: CDF-5

file header:
size   = 20620 bytes
extent = 16777216 bytes

dimensions:
ncol = 10485762
time = UNLIMITED // (1 currently)
nbnd = 2
chars = 8
lev = 58
ilev = 59

fixed-size variables:
double lat(ncol):
      start file offset =    16777216
      end   file offset =   100663312
      size in bytes     =    83886096
      gap from prev var =    16756596
double lon(ncol):
      start file offset =   100663312
      end   file offset =   184549408
      size in bytes     =    83886096
      gap from prev var =           0

snip

int    mdt:
      start file offset =   352322552
      end   file offset =   352322556
      size in bytes     =           4
      gap from prev var =           0

record variables:
double time(time):
      start file offset =   352322556    (0th record)
      end   file offset =   352322564    (0th record)
      size in bytes     =           8    (of one record)
      gap from prev var =           0
int    date(time):
      start file offset =   352322564    (0th record)
      end   file offset =   352322568    (0th record)
      size in bytes     =           4    (of one record)
      gap from prev var =           0

snip

double STEND_CLUBB(time, lev, ncol):
      start file offset =306872117448    (0th record)
      end   file offset =311737511016    (0th record)
      size in bytes     =  4865393568    (of one record)
      gap from prev var =           0
}

On Fri, Aug 4, 2023 at 10:35 AM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Can you run command "ncoffsets -sg file.nc<https://urldefense.com/v3/__http://file.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwCtAtD9k$>" that shows the sizes of file header
and all variables? For the cesm case, is _enddef called only once?

Could you also point me to the program files that call PnetCDF APIs, including
writing attributes and variables?


Wei-keng

On Aug 4, 2023, at 11:05 AM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:

I am using the new ncar system, derecho<https://urldefense.com/v3/__https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwigtcytrsOAAxWXHjQIHVoDD6YQFnoECBcQAQ&url=https*3A*2F*2Farc.ucar.edu*2Fknowledge_base*2F74317833&usg=AOvVaw2aXlWuOfLnua7fFmIgvfoV&opi=89978449__;JSUlJSU!!Dq0X2DkFhyF93HkjWTBQKhk!Xq6u5krREolkIRHG8AL2taDCmg6HsEdgcEoviUVyzqUINi-ipPM1EhtMcJkQfUYghDhutn7DfH5Wjm57wJ9lQhc$>, which has a lustre parallel file system.

Looking at the difference between the two headers below makes me wonder if the issue is with variable attributes?


snip


On Fri, Aug 4, 2023 at 9:39 AM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Hi, Jim

Can your provide the test program and the file header dumped by "ncdump -h", if that is available?
Also, what machine was used in the tests and its the parallel file system configuration is?
These can help diagnose.

Wei-keng

On Aug 4, 2023, at 8:49 AM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:

I am using ncmpi_iput_varn and ncmpi_wait_all to write output from my model.   I have a test program that does nothing but test the
performance of the write operation.   Attached is a plot of performance in the model and in the standalone application.   I'm looking for
clues as to why the model performance is scaling so badly with the number of variables but the standalone program performance is fine.



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
<Screenshot 2023-07-27 at 11.49.03 AM.png>



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO


--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230804/f38b4d34/attachment-0001.html>


More information about the parallel-netcdf mailing list