performance issue

Jim Edwards jedwards at ucar.edu
Fri Aug 11 16:25:49 CDT 2023


Yes - src/clib/pio_darray_int.c and src/clib/pio_getput_int.c

I've also attached a couple of darshan dxt profiles.  Both use the same
lustre file parameters, the first (dxt1.out) is the fast one without the
scalar write.
dxt2.out is the slow one.    It seems like adding the scalar is causing all
of the other writes to get broken up into smaller bits.
I've also tried moving around where the scalar variable is defined and
written with respect to the record variables - that doesn't seem to make
any difference.

On Fri, Aug 11, 2023 at 3:21 PM Wei-Keng Liao <wkliao at northwestern.edu>
wrote:

> Yes, I have.
>
> Can you let me know the source codes files that make the PnetCDF API calls?
>
>
> Wei-keng
>
> On Aug 11, 2023, at 4:10 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>
> Hi Wei-Keng,
>
> Sorry about the miscommunication earlier today - I just wanted to confirm
> that you've been able to reproduce the issue now?
>
> On Fri, Aug 11, 2023 at 1:01 PM Jim Edwards <jedwards at ucar.edu> wrote:
>
>> I'm sorry - I thought that I had provided that, but I guess not.
>> repo: git at github.com:jedwards4b/ParallelIO.git
>> branch: bugtest/lustre
>>
>> On Fri, Aug 11, 2023 at 12:46 PM Wei-Keng Liao <wkliao at northwestern.edu>
>> wrote:
>>
>>> Any particular github branch I should use?
>>>
>>> I got an error during make.
>>> /global/homes/w/wkliao/PIO/Github/ParallelIO/src/clib/pio_nc4.c:1481:18:
>>> error: call to undeclared function 'nc_inq_var_filter_ids'; ISO C99 and
>>> later do not support implicit function declarations
>>> [-Wimplicit-function-declaration]
>>>           ierr = nc_inq_var_filter_ids(file->fh, varid, nfiltersp, ids);
>>>                  ^
>>>
>>>
>>> Setting these 2 does not help.
>>> #undef NC_HAS_ZSTD
>>> #undef NC_HAS_BZ2
>>>
>>>
>>> Wei-keng
>>>
>>> > On Aug 11, 2023, at 12:44 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>>> >
>>> > I see I missed answering one question - total 2048 tasks.  (16 nodes)
>>> >
>>> > On Fri, Aug 11, 2023 at 11:35 AM Jim Edwards <jedwards at ucar.edu>
>>> wrote:
>>> > Here is my run script on perlmutter:
>>> >
>>> > #!/usr/bin/env python
>>> > #
>>> > #SBATCH -A mp9
>>> > #SBATCH -C cpu
>>> > #SBATCH --qos=regular
>>> > #SBATCH --time=15
>>> > #SBATCH --nodes=16
>>> > #SBATCH --ntasks-per-node=128
>>> >
>>> > import os
>>> > import glob
>>> >
>>> > with open("pioperf.nl
>>> <https://urldefense.com/v3/__http://pioperf.nl__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0OT7T_80$>","w")
>>> as fd:
>>> >     fd.write("&pioperf\n")
>>> >     fd.write("  decompfile='ROUNDROBIN'\n")
>>> > #    for filename in decompfiles:
>>> > #        fd.write("   '"+filename+"',\n")
>>> >     fd.write(" varsize=18560\n");
>>> >     fd.write(" pio_typenames = 'pnetcdf','pnetcdf'\n");
>>> >     fd.write(" rearrangers = 2\n");
>>> >     fd.write(" nframes = 1\n");
>>> >     fd.write(" nvars = 64\n");
>>> >     fd.write(" niotasks = 16\n");
>>> >     fd.write(" /\n")
>>> >
>>> > os.system("srun -n 2048 ~/parallelio/bld/tests/performance/pioperf ")
>>> >
>>> >
>>> > Module environment:
>>> > Currently Loaded Modules:
>>> >   1) craype-x86-milan                        6) cpe/23.03
>>>   11) craype-accel-nvidia80        16) craype/2.7.20          21)
>>> cmake/3.24.3
>>> >   2) libfabric/1.15.2.0
>>> <https://urldefense.com/v3/__http://1.15.2.0__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF09lbxLQc$>
>>>                     7) xalt/2.10.2              12) gpu/1.0
>>>       17) cray-dsmml/0.2.2       22) cray-parallel-netcdf/1.12.3.3
>>> <https://urldefense.com/v3/__http://1.12.3.3__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0srcAuMk$>
>>> >   3) craype-network-ofi                      8)
>>> Nsight-Compute/2022.1.1  13) evp-patch                    18)
>>> cray-mpich/8.1.25      23) cray-hdf5/1.12.2.3
>>> <https://urldefense.com/v3/__http://1.12.2.3__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0dMhur8g$>
>>> >   4) xpmem/2.5.2-2.4_3.49__gd0f7936.shasta   9)
>>> Nsight-Systems/2022.2.1  14) python/3.9-anaconda-2021.11  19) cray-libsci/
>>> 23.02.1.1
>>> <https://urldefense.com/v3/__http://23.02.1.1__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0iCkdumc$>
>>> 24) cray-netcdf/4.9.0.3
>>> <https://urldefense.com/v3/__http://4.9.0.3__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0K5aEvpc$>
>>> >   5) perftools-base/23.03.0                 10) cudatoolkit/11.7
>>>    15) intel/2023.1.0               20) PrgEnv-intel/8.3.3
>>> >
>>> > cmake command:
>>> >  CC=mpicc FC=mpifort cmake
>>> -DPNETCDF_DIR=$CRAY_PARALLEL_NETCDF_DIR/intel/19.0
>>> -DNETCDF_DIR=$CRAY_NETCDF_PREFIX -DHAVE_PAR_FILTERS=OFF ../
>>> >
>>> > There are a couple of issues with the build that can be fixed by
>>> editing file config.h (created in the bld directory by cmake)
>>> >
>>> > Add the following to config.h:
>>> >
>>> > #undef NC_HAS_ZSTD
>>> > #undef NC_HAS_BZ2
>>> >
>>> > then:
>>> > make pioperf
>>> >
>>> > once it's built run the submit script from $SCRATCH
>>> >
>>> > On Fri, Aug 11, 2023 at 11:13 AM Wei-Keng Liao <
>>> wkliao at northwestern.edu> wrote:
>>> > OK. I will test it myself on Perlmutter.
>>> > Do you have a small test program to reproduce or is it still pioperf?
>>> > If pioperf, are the build instructions on Perlmutter the same?
>>> >
>>> > Please let me know how you run on Perlmutter, i.e. no. process, nodes,
>>> > Lustre striping, problem size, etc.
>>> >
>>> > Does "1 16 64" in your results mean 16 I/O tasks and 64 variables,
>>> > yes this is correct
>>> >
>>> >   and only 16 MPI processes out of total ? processes call PnetCDF APIs?
>>> >
>>> > yes this is also correct.
>>> >
>>> >   Wei-keng
>>> >
>>> >> On Aug 11, 2023, at 9:35 AM, Jim Edwards <jedwards at ucar.edu> wrote:
>>> >>
>>> >> I tried on perlmutter and am seeing the same issue only maybe even
>>> worse:
>>> >>
>>> >> RESULT: write    SUBSET         1        16        64
>>>  1261.0737058071       14.7176171500
>>> >> RESULT: write    SUBSET         1        16        64
>>>  90.3736534450      205.3695882870
>>> >>
>>> >>
>>> >> On Fri, Aug 11, 2023 at 8:17 AM Jim Edwards <jedwards at ucar.edu>
>>> wrote:
>>> >> Hi Wei-Keng,
>>> >>
>>> >> I released that the numbers in this table are all showing the slow
>>> performing file and the fast file
>>> >> (the one without the scalar variable) are not represented - I will
>>> rerun and present these numbers again.
>>> >>
>>> >> Here are corrected numbers for a few cases:
>>> >> GPFS (/glade/work on derecho):
>>> >> RESULT: write    SUBSET         1        16        64
>>>  4570.2078677815        4.0610844270
>>> >> RESULT: write    SUBSET         1        16        64
>>>  4470.3231494386        4.1518251320
>>> >>
>>> >> Lustre, default PFL's:
>>> >> RESULT: write    SUBSET         1        16        64
>>>  2808.6570137094        6.6081404420
>>> >> RESULT: write    SUBSET         1        16        64
>>>  1025.1671656858       18.1043644600
>>> >>
>>> >> LUSTRE, no PFL's and very wide stripe:
>>> >>  RESULT: write    SUBSET         1        16        64
>>>  4687.6852437580        3.9593102000
>>> >>  RESULT: write    SUBSET         1        16        64
>>>  3001.4741125579        6.1836282120
>>> >>
>>> >> On Thu, Aug 10, 2023 at 11:34 AM Jim Edwards <jedwards at ucar.edu>
>>> wrote:
>>> >> the stripe settings
>>> >> lfs setstripe -c 96 -S 128M
>>> >> logs/c96_S128M/
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Jim Edwards
>>> >
>>> > CESM Software Engineer
>>> > National Center for Atmospheric Research
>>> > Boulder, CO
>>> >
>>> >
>>> > --
>>> > Jim Edwards
>>> >
>>> > CESM Software Engineer
>>> > National Center for Atmospheric Research
>>> > Boulder, CO
>>>
>>>
>>
>> --
>> Jim Edwards
>>
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO
>>
>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
>
>
>

-- 
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230811/66fa246c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dxt2.out
Type: application/octet-stream
Size: 206371 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230811/66fa246c/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dxt1.out
Type: application/octet-stream
Size: 115855 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230811/66fa246c/attachment-0003.obj>


More information about the parallel-netcdf mailing list