performance issue

Jim Edwards jedwards at ucar.edu
Fri Aug 11 09:17:15 CDT 2023


Hi Wei-Keng,

I released that the numbers in this table are all showing the slow
performing file and the fast file
(the one without the scalar variable) are not represented - I will rerun
and present these numbers again.

Here are corrected numbers for a few cases:
GPFS (/glade/work on derecho):
RESULT: write    SUBSET         1        16        64     4570.2078677815
     4.0610844270
RESULT: write    SUBSET         1        16        64     4470.3231494386
     4.1518251320

Lustre, default PFL's:
RESULT: write    SUBSET         1        16        64     2808.6570137094
     6.6081404420
RESULT: write    SUBSET         1        16        64     1025.1671656858
    18.1043644600

LUSTRE, no PFL's and very wide stripe:
 RESULT: write    SUBSET         1        16        64     4687.6852437580
       3.9593102000
 RESULT: write    SUBSET         1        16        64     3001.4741125579
       6.1836282120

On Thu, Aug 10, 2023 at 11:34 AM Jim Edwards <jedwards at ucar.edu> wrote:

> the stripe settings
> lfs setstripe -c 96 -S 128M
>
> logs/c96_S128M/
>
>
> On Thu, Aug 10, 2023 at 11:30 AM Wei-Keng Liao <wkliao at northwestern.edu>
> wrote:
>
>>
>> I guess the last 2 columns are bandwidth and time.
>> What is each row?
>>
>> Wei-keng
>>
>> On Aug 10, 2023, at 12:22 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>>
>> Here is our sweep of the parameter space:
>>
>> ./logs/c02_S001M/pioperf-c02_S001M.o1254947:dec2083 : RESULT:      91.2568175876      203.3820649310
>> ./logs/c02_S001M/pioperf-c02_S001M.o1254947:dec2083 : RESULT:      92.1182210732      201.4802259940
>> ./logs/c02_S004M/pioperf-c02_S004M.o1254940:dec0577 : RESULT:     329.6537701115       56.3014947280
>> ./logs/c02_S004M/pioperf-c02_S004M.o1254940:dec0577 : RESULT:     331.4727932376       55.9925290360
>> ./logs/c02_S016M/pioperf-c02_S016M.o1254933:dec0161 : RESULT:     769.7340678648       24.1122236560
>> ./logs/c02_S016M/pioperf-c02_S016M.o1254933:dec0161 : RESULT:     783.5713927539       23.6864185850
>> ./logs/c02_S032M/pioperf-c02_S032M.o1254926:dec0577 : RESULT:    1035.1709708580       17.9294054050
>> ./logs/c02_S032M/pioperf-c02_S032M.o1254926:dec0577 : RESULT:    1047.1429289828       17.7244189750
>> ./logs/c02_S064M/pioperf-c02_S064M.o1254919:dec0161 : RESULT:    1148.3652258918       16.1621055580
>> ./logs/c02_S064M/pioperf-c02_S064M.o1254919:dec0161 : RESULT:    1151.0578818115       16.1242977380
>> ./logs/c02_S128M/pioperf-c02_S128M.o1254912:dec0577 : RESULT:    1274.3409716484       14.5643908600
>> ./logs/c02_S128M/pioperf-c02_S128M.o1254912:dec0577 : RESULT:    1317.0599727821       14.0919930630
>> ./logs/c04_S004M/pioperf-c04_S004M.o1254941:dec0161 : RESULT:     343.0181778797       54.1079196290
>> ./logs/c04_S004M/pioperf-c04_S004M.o1254941:dec0161 : RESULT:     354.2258712407       52.3959470690
>> ./logs/c04_S016M/pioperf-c04_S016M.o1254934:dec0577 : RESULT:     942.6368460375       19.6894488880
>> ./logs/c04_S016M/pioperf-c04_S016M.o1254934:dec0577 : RESULT:     944.2729162573       19.6553344700
>> ./logs/c04_S032M/pioperf-c04_S032M.o1254927:dec0161 : RESULT:    1313.3396548286       14.1319116740
>> ./logs/c04_S032M/pioperf-c04_S032M.o1254927:dec0161 : RESULT:    1335.6182195583       13.8961865960
>> ./logs/c04_S064M/pioperf-c04_S064M.o1254920:dec0577 : RESULT:    1609.7018723157       11.5300853650
>> ./logs/c04_S064M/pioperf-c04_S064M.o1254920:dec0577 : RESULT:    1664.6891018193       11.1492289940
>> ./logs/c04_S128M/pioperf-c04_S128M.o1254913:dec0161 : RESULT:    1921.4885407968        9.6591780830
>> ./logs/c04_S128M/pioperf-c04_S128M.o1254913:dec0161 : RESULT:    1950.4420873454        9.5157913790
>> ./logs/c06_S004M/pioperf-c06_S004M.o1254942:dec0577 : RESULT:     362.3356698753       51.2232207400
>> ./logs/c06_S004M/pioperf-c06_S004M.o1254942:dec0577 : RESULT:     363.0765787757       51.1186925430
>> ./logs/c06_S016M/pioperf-c06_S016M.o1254935:dec0161 : RESULT:     952.6217959754       19.4830730080
>> ./logs/c06_S016M/pioperf-c06_S016M.o1254935:dec0161 : RESULT:     972.3610628258       19.0875598680
>> ./logs/c06_S032M/pioperf-c06_S032M.o1254928:dec0577 : RESULT:    1440.7021713493       12.8826070850
>> ./logs/c06_S032M/pioperf-c06_S032M.o1254928:dec0577 : RESULT:    1453.8729235808       12.7659025070
>> ./logs/c06_S064M/pioperf-c06_S064M.o1254921:dec0161 : RESULT:    1822.0246479036       10.1864703210
>> ./logs/c06_S064M/pioperf-c06_S064M.o1254921:dec0161 : RESULT:    1863.8737349725        9.9577560710
>> ./logs/c06_S128M/pioperf-c06_S128M.o1254914:dec0577 : RESULT:    2277.4926061236        8.1493129550
>> ./logs/c06_S128M/pioperf-c06_S128M.o1254914:dec0577 : RESULT:    2325.8628081953        7.9798343800
>> ./logs/c12_S004M/pioperf-c12_S004M.o1254943:dec2083 : RESULT:     349.2503347567       53.1423971660
>> ./logs/c12_S004M/pioperf-c12_S004M.o1254943:dec2083 : RESULT:     375.3362671288       49.4489918120
>> ./logs/c12_S016M/pioperf-c12_S016M.o1254936:dec0577 : RESULT:    1034.4932215248       17.9411518740
>> ./logs/c12_S016M/pioperf-c12_S016M.o1254936:dec0577 : RESULT:    1078.9753283698       17.2015054580
>> ./logs/c12_S032M/pioperf-c12_S032M.o1254929:dec0161 : RESULT:    1564.2446068140       11.8651519840
>> ./logs/c12_S032M/pioperf-c12_S032M.o1254929:dec0161 : RESULT:    1615.7171524077       11.4871591060
>> ./logs/c12_S064M/pioperf-c12_S064M.o1254922:dec0577 : RESULT:    2240.1838384079        8.2850343270
>> ./logs/c12_S064M/pioperf-c12_S064M.o1254922:dec0577 : RESULT:    2245.5085567722        8.2653882320
>> ./logs/c12_S128M/pioperf-c12_S128M.o1254915:dec0161 : RESULT:    2798.8898011180        6.6312006970
>> ./logs/c12_S128M/pioperf-c12_S128M.o1254915:dec0161 : RESULT:    2877.7394474960        6.4495067530
>> ./logs/c24_S004M/pioperf-c24_S004M.o1254944:dec0577 : RESULT:     345.1242519527       53.7777333670
>> ./logs/c24_S004M/pioperf-c24_S004M.o1254944:dec0577 : RESULT:     351.4297493271       52.8128311150
>> ./logs/c24_S016M/pioperf-c24_S016M.o1254937:dec0161 : RESULT:     925.2242106049       20.0600025240
>> ./logs/c24_S016M/pioperf-c24_S016M.o1254937:dec0161 : RESULT:     954.0638835219       19.4536239350
>> ./logs/c24_S032M/pioperf-c24_S032M.o1254930:dec0577 : RESULT:    1452.2920020184       12.7797990860
>> ./logs/c24_S032M/pioperf-c24_S032M.o1254930:dec0577 : RESULT:    1471.4735322672       12.6132068250
>> ./logs/c24_S064M/pioperf-c24_S064M.o1254923:dec0161 : RESULT:    2050.9979556500        9.0492532910
>> ./logs/c24_S064M/pioperf-c24_S064M.o1254923:dec0161 : RESULT:    2114.0203423275        8.7794803240
>> ./logs/c24_S128M/pioperf-c24_S128M.o1254916:dec0577 : RESULT:    2675.1011803584        6.9380553290
>> ./logs/c24_S128M/pioperf-c24_S128M.o1254916:dec0577 : RESULT:    2693.2253543755        6.8913653920
>> ./logs/c48_S004M/pioperf-c48_S004M.o1254945:dec2083 : RESULT:     340.1527179740       54.5637268770
>> ./logs/c48_S004M/pioperf-c48_S004M.o1254945:dec2083 : RESULT:     371.8511430332       49.9124457400
>> ./logs/c48_S016M/pioperf-c48_S016M.o1254938:dec0577 : RESULT:    1093.3036084645       16.9760712910
>> ./logs/c48_S016M/pioperf-c48_S016M.o1254938:dec0577 : RESULT:    1110.7761821018       16.7090367070
>> ./logs/c48_S032M/pioperf-c48_S032M.o1254931:dec0161 : RESULT:    1542.1770323116       12.0349347780
>> ./logs/c48_S032M/pioperf-c48_S032M.o1254931:dec0161 : RESULT:    1546.3824217323       12.0022057540
>> ./logs/c48_S064M/pioperf-c48_S064M.o1254924:dec0577 : RESULT:    2217.8654888939        8.3684065120
>> ./logs/c48_S064M/pioperf-c48_S064M.o1254924:dec0577 : RESULT:    2257.5814223090        8.2211874250
>> ./logs/c48_S128M/pioperf-c48_S128M.o1254917:dec0161 : RESULT:    2944.0786904926        6.3041793210
>> ./logs/c48_S128M/pioperf-c48_S128M.o1254917:dec0161 : RESULT:    3011.5947346521        6.1628478050
>> ./logs/c96_S004M/pioperf-c96_S004M.o1254946:dec0577 : RESULT:     339.0226605915       54.7456030450
>> ./logs/c96_S004M/pioperf-c96_S004M.o1254946:dec0577 : RESULT:     340.8120077996       54.4581751090
>> ./logs/c96_S016M/pioperf-c96_S016M.o1254939:dec0161 : RESULT:    1081.0376152838       17.1686902820
>> ./logs/c96_S016M/pioperf-c96_S016M.o1254939:dec0161 : RESULT:     786.6881169129       23.5925770340
>> ./logs/c96_S032M/pioperf-c96_S032M.o1254932:dec0577 : RESULT:    1485.9194264343       12.4905830490
>> ./logs/c96_S032M/pioperf-c96_S032M.o1254932:dec0577 : RESULT:    1550.1228426459       11.9732446290
>> ./logs/c96_S064M/pioperf-c96_S064M.o1254925:dec0161 : RESULT:    2264.4052929602        8.1964125670
>> ./logs/c96_S064M/pioperf-c96_S064M.o1254925:dec0161 : RESULT:    2295.6724959924        8.0847769150
>> ./logs/c96_S128M/pioperf-c96_S128M.o1254918:dec0577 : RESULT:    2762.3676993794        6.7188738140
>> ./logs/c96_S128M/pioperf-c96_S128M.o1254918:dec0577 : RESULT:    2813.8936765002        6.5958426770
>>
>>
>> On Thu, Aug 10, 2023 at 11:09 AM Wei-Keng Liao <wkliao at northwestern.edu>
>> wrote:
>>
>>> File striping size of 128 MB seems too big to me.
>>> I have never tried anything larger than 16 MB and
>>> learned that large sizes often performed worse.
>>>
>>> striping_unit=134217728
>>>
>>>
>>> Have you tried just 1 MB?
>>>
>>> Wei-keng
>>>
>>> On Aug 10, 2023, at 12:01 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>
>>>
>>>
>>> On Thu, Aug 10, 2023 at 10:53 AM Wei-Keng Liao <wkliao at northwestern.edu>
>>> wrote:
>>>
>>>> From the file header dump you sent earlier, no PnetCDF hint is
>>>> necessary.
>>>>
>>>> What MPI-IO hints are you using?
>>>>
>>> I am now using:
>>>
>>> MPICH_MPIIO_HINTS=*:romio_cb_read=enable:romio_cb_write=enable:striping_factor=48:striping_unit=134217728
>>>
>>> we explored the parameter space and this seems to be about the best for
>>> this particular file.   But I think that there is still something
>>> wrong at a low level and am working on trying to figure that out.
>>>
>>>
>>>> One question. Did you delete the output file before each run?
>>>>
>>>> Yes
>>>
>>>
>>>
>>>
>>>> Wei-keng
>>>>
>>>> On Aug 10, 2023, at 11:35 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>
>>>> We are having a lot of problems with darshan on our system - I'm
>>>> working with the darshan developers to resolve.
>>>>
>>>> I am setting MPIIO hints - but I see that there are some hints specific
>>>> to pnetcdf - do you have any recommendations?
>>>>
>>>> On Thu, Aug 10, 2023 at 10:15 AM Wei-Keng Liao <wkliao at northwestern.edu>
>>>> wrote:
>>>>
>>>>> Hi, Jim
>>>>>
>>>>> FYI. Darshan now can capture the I/O activities of PnetCDF,
>>>>> in addition to the already MPI-IO and POSIX-IO.
>>>>>
>>>>> Wei-keng
>>>>>
>>>>> On Aug 9, 2023, at 6:22 PM, Wei-Keng Liao <wkliao at northwestern.edu>
>>>>> wrote:
>>>>>
>>>>> In that case, I have the E3SM-IO benchmark that has a fairly
>>>>> complicate I/O
>>>>> partitioning pattern. It used the decomposition maps generated from
>>>>> PIO.
>>>>> https://github.com/Parallel-NetCDF/E3SM-IO
>>>>> <https://urldefense.com/v3/__https://github.com/Parallel-NetCDF/E3SM-IO__;!!Dq0X2DkFhyF93HkjWTBQKhk!UJAnknjDPQWY43szbAutlokjDrSHyBOcTlhEOChqYtThxnr812hxwvb8aSmHcEPiNOxNdOUHZi9Z2fyXbg2j6JH1HTo$>
>>>>>
>>>>> Wei-keng
>>>>>
>>>>> On Aug 9, 2023, at 6:17 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>>>>>
>>>>> I think that your example case is too simple - it's doing a simple
>>>>> block decomposition.
>>>>> In order to get the performance difference I am observing I need to do
>>>>> a more complicated
>>>>> mapping.   I will work on a program that reproduces the problem
>>>>> without pio but it may take a
>>>>> while.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Jim Edwards
>>>>
>>>> CESM Software Engineer
>>>> National Center for Atmospheric Research
>>>> Boulder, CO
>>>>
>>>>
>>>>
>>>
>>> --
>>> Jim Edwards
>>>
>>> CESM Software Engineer
>>> National Center for Atmospheric Research
>>> Boulder, CO
>>>
>>>
>>>
>>
>> --
>> Jim Edwards
>>
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO
>>
>>
>>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
>


-- 
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230811/f1984541/attachment-0001.html>


More information about the parallel-netcdf mailing list