performance issue
Wei-Keng Liao
wkliao at northwestern.edu
Fri Aug 4 14:28:00 CDT 2023
Hi, Jim
Could you print all the MPI-IO hints for using the Lustre progressive file layout?
That may reveal some information.
Wei-keng
On Aug 4, 2023, at 1:45 PM, Wei-Keng Liao <wkliao at northwestern.edu> wrote:
Looks like the cesm file is using "Lustre progressive file layout", which is
a new striping strategy. My guess is it is used center-widely by default.
Rob Latham at ANL has more experiences on this feature. He may have some suggestions.
In the meantime, can you write to a new folder and explicitly set its
Lustre striping count to a larger number, as the total write amount is more
than 300GB? This old fashion setting may give a consistent timing.
Wei-keng
On Aug 4, 2023, at 1:27 PM, Jim Edwards <jedwards at ucar.edu> wrote:
oh and the pnetcdf version is 1.12.3
On Fri, Aug 4, 2023 at 12:25 PM Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
On my list of things to do in PIO is rewriting the error handling code - but that issue is the same for both cases and
so I don't think it would play a role in the difference we are seeing.
The lfs getstripe output of the two files is nearly identical, I show only the cesm file here.
lcm_layout_gen: 7
lcm_mirror_count: 1
lcm_entry_count: 4
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 16777216
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 27
lmm_objects:
- 0: { l_ost_idx: 27, l_fid: [0xa80000401:0x3f2ef8:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 16777216
lcme_extent.e_end: 17179869184
lmm_stripe_count: 4
lmm_stripe_size: 16777216
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 35
lmm_objects:
- 0: { l_ost_idx: 35, l_fid: [0xc80000401:0x3f32e5:0x0] }
- 1: { l_ost_idx: 39, l_fid: [0xcc0000402:0x3f3162:0x0] }
- 2: { l_ost_idx: 43, l_fid: [0xe80000402:0x3f2f3a:0x0] }
- 3: { l_ost_idx: 47, l_fid: [0xec0000401:0x3f3017:0x0] }
lcme_id: 3
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 17179869184
lcme_extent.e_end: 68719476736
lmm_stripe_count: 12
lmm_stripe_size: 16777216
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 16
lmm_objects:
- 0: { l_ost_idx: 16, l_fid: [0x700000402:0x4021eb:0x0] }
- 1: { l_ost_idx: 20, l_fid: [0x740000402:0x4020ae:0x0] }
- 2: { l_ost_idx: 24, l_fid: [0x900000400:0x401f68:0x0] }
- 3: { l_ost_idx: 28, l_fid: [0x940000400:0x401f71:0x0] }
- 4: { l_ost_idx: 32, l_fid: [0xb00000402:0x40220c:0x0] }
- 5: { l_ost_idx: 36, l_fid: [0xb40000402:0x40210b:0x0] }
- 6: { l_ost_idx: 40, l_fid: [0xd00000402:0x402141:0x0] }
- 7: { l_ost_idx: 44, l_fid: [0xd40000400:0x401e90:0x0] }
- 8: { l_ost_idx: 48, l_fid: [0xf00000401:0x401e08:0x0] }
- 9: { l_ost_idx: 52, l_fid: [0xf40000400:0x401e32:0x0] }
- 10: { l_ost_idx: 56, l_fid: [0x1100000402:0x4022e0:0x0] }
- 11: { l_ost_idx: 60, l_fid: [0x1140000402:0x4020a6:0x0] }
lcme_id: 4
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 68719476736
lcme_extent.e_end: EOF
lmm_stripe_count: 24
lmm_stripe_size: 16777216
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 51
lmm_objects:
- 0: { l_ost_idx: 51, l_fid: [0x1080000400:0x402881:0x0] }
- 1: { l_ost_idx: 55, l_fid: [0x10c0000400:0x402955:0x0] }
- 2: { l_ost_idx: 59, l_fid: [0x1280000400:0x4027d6:0x0] }
- 3: { l_ost_idx: 63, l_fid: [0x12c0000401:0x402ab2:0x0] }
- 4: { l_ost_idx: 67, l_fid: [0x1480000400:0x402b75:0x0] }
- 5: { l_ost_idx: 71, l_fid: [0x14c0000400:0x4028d2:0x0] }
- 6: { l_ost_idx: 75, l_fid: [0x1680000401:0x402a3b:0x0] }
- 7: { l_ost_idx: 79, l_fid: [0x16c0000402:0x40294d:0x0] }
- 8: { l_ost_idx: 83, l_fid: [0x1880000401:0x40299c:0x0] }
- 9: { l_ost_idx: 87, l_fid: [0x18c0000402:0x402f5e:0x0] }
- 10: { l_ost_idx: 91, l_fid: [0x1a80000400:0x402a16:0x0] }
- 11: { l_ost_idx: 95, l_fid: [0x1ac0000400:0x402bd2:0x0] }
- 12: { l_ost_idx: 0, l_fid: [0x300000401:0x402a2e:0x0] }
- 13: { l_ost_idx: 4, l_fid: [0x340000402:0x4027d2:0x0] }
- 14: { l_ost_idx: 8, l_fid: [0x500000402:0x402a26:0x0] }
- 15: { l_ost_idx: 12, l_fid: [0x540000400:0x402943:0x0] }
- 16: { l_ost_idx: 64, l_fid: [0x1300000402:0x402c10:0x0] }
- 17: { l_ost_idx: 68, l_fid: [0x1340000401:0x4029c6:0x0] }
- 18: { l_ost_idx: 72, l_fid: [0x1500000402:0x402d11:0x0] }
- 19: { l_ost_idx: 76, l_fid: [0x1540000402:0x402be2:0x0] }
- 20: { l_ost_idx: 80, l_fid: [0x1700000400:0x402a64:0x0] }
- 21: { l_ost_idx: 84, l_fid: [0x1740000401:0x402b11:0x0] }
- 22: { l_ost_idx: 88, l_fid: [0x1900000402:0x402cb8:0x0] }
- 23: { l_ost_idx: 92, l_fid: [0x1940000400:0x402cd3:0x0] }
On Fri, Aug 4, 2023 at 12:12 PM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
I can see the file header size of 20620 bytes. Because all attributes are stored
in the header, the cost of writing them should not be an issue. I also see no gap
between 2 consecutive variables, which is good, meaning the write requests made
by MPI-IO will be contiguous.
If the call sequence of PnetCDF APIs is the same between pioperf and cesm, then
the performance should be similarly. Can you check the Lustre striping settings
of the 2 output files, using command "lfs getstripe"?
If you set any MPI-IO hints, they can also play a role in performance.
See the example in PnetCDF for how to dump all hints (function print_info().)
https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/get_info.c<https://urldefense.com/v3/__https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/get_info.c__;!!Dq0X2DkFhyF93HkjWTBQKhk!S2kyuECdrloQfFGddM-B1Y9QnqTEKxfduivqOXqC5UpDlBkIBRYblv--jZO9EebTthfwwDYkKtJkHqMsgmikJ3c$>
If all the above checked out right, then using Darshan should reveal more information.
BTW, what PnetCDF version is being used?
A comment about PIOc_put_att_tc.
* calling MPI_Bcast for checking the error code may not be necessary. PnetCDF does such
check and all metadata consistency check at ncmpi_enddef. If the number of variables
and their attributes are high, then calling lots of MPI_Bcast can be expensive.
https://github.com/NCAR/ParallelIO/blob/f45ba898bec31e6cd662ac41f43e0cff14f928b2/src/clib/pio_getput_int.c#L213<https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/f45ba898bec31e6cd662ac41f43e0cff14f928b2/src/clib/pio_getput_int.c*L213__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!S2kyuECdrloQfFGddM-B1Y9QnqTEKxfduivqOXqC5UpDlBkIBRYblv--jZO9EebTthfwwDYkKtJkHqMsyB5l4Jg$>
Wei-keng
On Aug 4, 2023, at 12:32 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
Yes, _enddef is called only once.
Here<https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/main/src/clib/pio_getput_int.c*L128__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hw4F6HzDo$> is the code that writes attributes. Here<https://urldefense.com/v3/__https://github.com/NCAR/ParallelIO/blob/main/src/clib/pio_darray_int.c*L661__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwOC-tyLY$> is where variables are written.
ncoffsets -sg pioperf.2-0256-1.nc<https://urldefense.com/v3/__http://pioperf.2-0256-1.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwTlZDyu8$>
netcdf pioperf.2-0256-1.nc<https://urldefense.com/v3/__http://pioperf.2-0256-1.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwTlZDyu8$> {
// file format: CDF-5
file header:
size = 7804 bytes
extent = 8192 bytes
dimensions:
dim000001 = 10485762
dim000002 = 58
time = UNLIMITED // (1 currently)
record variables:
double vard0001(time, dim000002, dim000001):
start file offset = 8192 (0th record)
end file offset = 4865401760 (0th record)
size in bytes = 4865393568 (of one record)
gap from prev var = 388
double vard0002(time, dim000002, dim000001):
start file offset = 4865401760 (0th record)
end file offset = 9730795328 (0th record)
size in bytes = 4865393568 (of one record)
gap from prev var = 0
snip
double vard0064(time, dim000002, dim000001):
start file offset =306519802976 (0th record)
end file offset =311385196544 (0th record)
size in bytes = 4865393568 (of one record)
gap from prev var = 0
}
ncoffsets -sg run/SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc<https://urldefense.com/v3/__http://SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwdkm0FHQ$>
netcdf run/SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc<https://urldefense.com/v3/__http://SMS_D_Ln9.mpasa7p5_mpasa7p5_mg17.QPC6.derecho_intel.cam-outfrq9s.20230726_094231_iz24v6.cam.h0.0001-01-01-03600.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwdkm0FHQ$> {
// file format: CDF-5
file header:
size = 20620 bytes
extent = 16777216 bytes
dimensions:
ncol = 10485762
time = UNLIMITED // (1 currently)
nbnd = 2
chars = 8
lev = 58
ilev = 59
fixed-size variables:
double lat(ncol):
start file offset = 16777216
end file offset = 100663312
size in bytes = 83886096
gap from prev var = 16756596
double lon(ncol):
start file offset = 100663312
end file offset = 184549408
size in bytes = 83886096
gap from prev var = 0
snip
int mdt:
start file offset = 352322552
end file offset = 352322556
size in bytes = 4
gap from prev var = 0
record variables:
double time(time):
start file offset = 352322556 (0th record)
end file offset = 352322564 (0th record)
size in bytes = 8 (of one record)
gap from prev var = 0
int date(time):
start file offset = 352322564 (0th record)
end file offset = 352322568 (0th record)
size in bytes = 4 (of one record)
gap from prev var = 0
snip
double STEND_CLUBB(time, lev, ncol):
start file offset =306872117448 (0th record)
end file offset =311737511016 (0th record)
size in bytes = 4865393568 (of one record)
gap from prev var = 0
}
On Fri, Aug 4, 2023 at 10:35 AM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Can you run command "ncoffsets -sg file.nc<https://urldefense.com/v3/__http://file.nc__;!!Dq0X2DkFhyF93HkjWTBQKhk!Rb9IHCtwLvKBflvuPIGfD-peS-Hl1-epxN7yjpgkPoFWdSFS3DFGNkKhfb7WqrC_N0TBJDe-1bKKU_hwCtAtD9k$>" that shows the sizes of file header
and all variables? For the cesm case, is _enddef called only once?
Could you also point me to the program files that call PnetCDF APIs, including
writing attributes and variables?
Wei-keng
On Aug 4, 2023, at 11:05 AM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
I am using the new ncar system, derecho<https://urldefense.com/v3/__https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwigtcytrsOAAxWXHjQIHVoDD6YQFnoECBcQAQ&url=https*3A*2F*2Farc.ucar.edu*2Fknowledge_base*2F74317833&usg=AOvVaw2aXlWuOfLnua7fFmIgvfoV&opi=89978449__;JSUlJSU!!Dq0X2DkFhyF93HkjWTBQKhk!Xq6u5krREolkIRHG8AL2taDCmg6HsEdgcEoviUVyzqUINi-ipPM1EhtMcJkQfUYghDhutn7DfH5Wjm57wJ9lQhc$>, which has a lustre parallel file system.
Looking at the difference between the two headers below makes me wonder if the issue is with variable attributes?
snip
On Fri, Aug 4, 2023 at 9:39 AM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Hi, Jim
Can your provide the test program and the file header dumped by "ncdump -h", if that is available?
Also, what machine was used in the tests and its the parallel file system configuration is?
These can help diagnose.
Wei-keng
On Aug 4, 2023, at 8:49 AM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
I am using ncmpi_iput_varn and ncmpi_wait_all to write output from my model. I have a test program that does nothing but test the
performance of the write operation. Attached is a plot of performance in the model and in the standalone application. I'm looking for
clues as to why the model performance is scaling so badly with the number of variables but the standalone program performance is fine.
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
<Screenshot 2023-07-27 at 11.49.03 AM.png>
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230804/f38b4d34/attachment-0001.html>
More information about the parallel-netcdf
mailing list