performance issue
Wei-Keng Liao
wkliao at northwestern.edu
Fri Aug 11 17:41:17 CDT 2023
I can see line 344 of pioperformance.F90 is called by all processes.
nvarmult= pio_put_var(File, rundate, date//' '//time(1:4))
How do I change it, so it is called by rank 0 only?
Wei-keng
On Aug 11, 2023, at 4:25 PM, Jim Edwards <jedwards at ucar.edu> wrote:
Yes - src/clib/pio_darray_int.c and src/clib/pio_getput_int.c
I've also attached a couple of darshan dxt profiles. Both use the same
lustre file parameters, the first (dxt1.out) is the fast one without the scalar write.
dxt2.out is the slow one. It seems like adding the scalar is causing all of the other writes to get broken up into smaller bits.
I've also tried moving around where the scalar variable is defined and written with respect to the record variables - that doesn't seem to make any difference.
On Fri, Aug 11, 2023 at 3:21 PM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Yes, I have.
Can you let me know the source codes files that make the PnetCDF API calls?
Wei-keng
On Aug 11, 2023, at 4:10 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
Hi Wei-Keng,
Sorry about the miscommunication earlier today - I just wanted to confirm that you've been able to reproduce the issue now?
On Fri, Aug 11, 2023 at 1:01 PM Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
I'm sorry - I thought that I had provided that, but I guess not.
repo: git at github.com:jedwards4b/ParallelIO.git
branch: bugtest/lustre
On Fri, Aug 11, 2023 at 12:46 PM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Any particular github branch I should use?
I got an error during make.
/global/homes/w/wkliao/PIO/Github/ParallelIO/src/clib/pio_nc4.c:1481:18: error: call to undeclared function 'nc_inq_var_filter_ids'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
ierr = nc_inq_var_filter_ids(file->fh, varid, nfiltersp, ids);
^
Setting these 2 does not help.
#undef NC_HAS_ZSTD
#undef NC_HAS_BZ2
Wei-keng
> On Aug 11, 2023, at 12:44 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
>
> I see I missed answering one question - total 2048 tasks. (16 nodes)
>
> On Fri, Aug 11, 2023 at 11:35 AM Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
> Here is my run script on perlmutter:
>
> #!/usr/bin/env python
> #
> #SBATCH -A mp9
> #SBATCH -C cpu
> #SBATCH --qos=regular
> #SBATCH --time=15
> #SBATCH --nodes=16
> #SBATCH --ntasks-per-node=128
>
> import os
> import glob
>
> with open("pioperf.nl<https://urldefense.com/v3/__http://pioperf.nl__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0OT7T_80$>","w") as fd:
> fd.write("&pioperf\n")
> fd.write(" decompfile='ROUNDROBIN'\n")
> # for filename in decompfiles:
> # fd.write(" '"+filename+"',\n")
> fd.write(" varsize=18560\n");
> fd.write(" pio_typenames = 'pnetcdf','pnetcdf'\n");
> fd.write(" rearrangers = 2\n");
> fd.write(" nframes = 1\n");
> fd.write(" nvars = 64\n");
> fd.write(" niotasks = 16\n");
> fd.write(" /\n")
>
> os.system("srun -n 2048 ~/parallelio/bld/tests/performance/pioperf ")
>
>
> Module environment:
> Currently Loaded Modules:
> 1) craype-x86-milan 6) cpe/23.03 11) craype-accel-nvidia80 16) craype/2.7.20 21) cmake/3.24.3
> 2) libfabric/1.15.2.0<https://urldefense.com/v3/__http://1.15.2.0__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF09lbxLQc$> 7) xalt/2.10.2 12) gpu/1.0 17) cray-dsmml/0.2.2 22) cray-parallel-netcdf/1.12.3.3<https://urldefense.com/v3/__http://1.12.3.3__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0srcAuMk$>
> 3) craype-network-ofi 8) Nsight-Compute/2022.1.1 13) evp-patch 18) cray-mpich/8.1.25 23) cray-hdf5/1.12.2.3<https://urldefense.com/v3/__http://1.12.2.3__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0dMhur8g$>
> 4) xpmem/2.5.2-2.4_3.49__gd0f7936.shasta 9) Nsight-Systems/2022.2.1 14) python/3.9-anaconda-2021.11 19) cray-libsci/23.02.1.1<https://urldefense.com/v3/__http://23.02.1.1__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0iCkdumc$> 24) cray-netcdf/4.9.0.3<https://urldefense.com/v3/__http://4.9.0.3__;!!Dq0X2DkFhyF93HkjWTBQKhk!Wb6cQ1sLURpN2Ny6hale9LwE7W4NZTmSs0o72VGzPYA0zKG52eGy3PPukBWcmzrlC-J0mML7UGxclZF0K5aEvpc$>
> 5) perftools-base/23.03.0 10) cudatoolkit/11.7 15) intel/2023.1.0 20) PrgEnv-intel/8.3.3
>
> cmake command:
> CC=mpicc FC=mpifort cmake -DPNETCDF_DIR=$CRAY_PARALLEL_NETCDF_DIR/intel/19.0 -DNETCDF_DIR=$CRAY_NETCDF_PREFIX -DHAVE_PAR_FILTERS=OFF ../
>
> There are a couple of issues with the build that can be fixed by editing file config.h (created in the bld directory by cmake)
>
> Add the following to config.h:
>
> #undef NC_HAS_ZSTD
> #undef NC_HAS_BZ2
>
> then:
> make pioperf
>
> once it's built run the submit script from $SCRATCH
>
> On Fri, Aug 11, 2023 at 11:13 AM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
> OK. I will test it myself on Perlmutter.
> Do you have a small test program to reproduce or is it still pioperf?
> If pioperf, are the build instructions on Perlmutter the same?
>
> Please let me know how you run on Perlmutter, i.e. no. process, nodes,
> Lustre striping, problem size, etc.
>
> Does "1 16 64" in your results mean 16 I/O tasks and 64 variables,
> yes this is correct
>
> and only 16 MPI processes out of total ? processes call PnetCDF APIs?
>
> yes this is also correct.
>
> Wei-keng
>
>> On Aug 11, 2023, at 9:35 AM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
>>
>> I tried on perlmutter and am seeing the same issue only maybe even worse:
>>
>> RESULT: write SUBSET 1 16 64 1261.0737058071 14.7176171500
>> RESULT: write SUBSET 1 16 64 90.3736534450 205.3695882870
>>
>>
>> On Fri, Aug 11, 2023 at 8:17 AM Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
>> Hi Wei-Keng,
>>
>> I released that the numbers in this table are all showing the slow performing file and the fast file
>> (the one without the scalar variable) are not represented - I will rerun and present these numbers again.
>>
>> Here are corrected numbers for a few cases:
>> GPFS (/glade/work on derecho):
>> RESULT: write SUBSET 1 16 64 4570.2078677815 4.0610844270
>> RESULT: write SUBSET 1 16 64 4470.3231494386 4.1518251320
>>
>> Lustre, default PFL's:
>> RESULT: write SUBSET 1 16 64 2808.6570137094 6.6081404420
>> RESULT: write SUBSET 1 16 64 1025.1671656858 18.1043644600
>>
>> LUSTRE, no PFL's and very wide stripe:
>> RESULT: write SUBSET 1 16 64 4687.6852437580 3.9593102000
>> RESULT: write SUBSET 1 16 64 3001.4741125579 6.1836282120
>>
>> On Thu, Aug 10, 2023 at 11:34 AM Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
>> the stripe settings
>> lfs setstripe -c 96 -S 128M
>> logs/c96_S128M/
>>
>>
>
>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
<dxt2.out><dxt1.out>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230811/70c30503/attachment-0001.html>
More information about the parallel-netcdf
mailing list