performance issue

Wei-Keng Liao wkliao at northwestern.edu
Wed Aug 9 16:13:27 CDT 2023


Googling gave me this.
"The number of the gaps and the size is related to how many seek operation happens and how much is the size of the file in bytes that is skipped to write the next part."


Are you still using the default file striping settings?

Wei-keng

On Aug 9, 2023, at 3:51 PM, Jim Edwards <jedwards at ucar.edu> wrote:

I spent a little time trying to do this but gave up and went back to using cray profiling tools to get more info.
One thing really stands out to me:

This is for the fast write:
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: | number of write gaps = 2
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: | ave write gap size = 9722924978
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: --------------------------------------------------------
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: RESULT: write SUBSET 1 16 64 4060.0217755460 4.5714040530

And this is for the slow one:
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: | number of write gaps = 1020
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: | ave write gap size = 19079761
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: --------------------------------------------------------
dec1793.hsn.de.hpc.ucar.edu<https://urldefense.com/v3/__http://dec1793.hsn.de.hpc.ucar.edu__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-eNzr_M-E$> 0: RESULT: write SUBSET 1 16 64 76.2558020443 243.3913158400


Do you understand?

On Tue, Aug 8, 2023 at 11:50 AM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
I have revised the example program to add writes to scalar and record variables.
Let me know if that works for you. URL again is below.

https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c<https://urldefense.com/v3/__https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c__;!!Dq0X2DkFhyF93HkjWTBQKhk!SP4va2rvVHU4KEb9PEsINCVGTkdEiITT61-aKfQXjWRmlCGrqiRw6rNt8YvXMwO2eOwRL2T7qyjXj_-e9eaX1iY$>

Wei-keng

On Aug 7, 2023, at 6:10 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:

That example doesn't include record variables.  Do you have a similar one with record vars?



On Mon, Aug 7, 2023 at 4:32 PM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Hi, Jim

To eliminate the overheads of PIO, I suggest to use this PnetCDF example program
and add a scalar variable to see if the same happens.

https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c<https://urldefense.com/v3/__https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c__;!!Dq0X2DkFhyF93HkjWTBQKhk!RGlLkVUbuYrrGrSkShv42nz4KqtPJK0FiNzPuYKV-esdwU5UcgKr0xLvQpOooAfY4n2UMB8meSG2ZanhcYgGU_Q$>

Wei-keng

On Aug 7, 2023, at 4:28 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:

Hi Wei-Keng,

The cb_nodes doesn't seem to be affected.

Not using independent mode doesn't seem to have helped.  I have the pioperf program now writing two files.  One with only
decomposed fields and one with one additional field, rundate, which is a string with the date in it.

The performance is drastically different:
                                                         IO tasks   vars      Mb/s                           Time (s)
 RESULT: write    SUBSET         1       256        64    12067.7548254854       25.1577347560    (without scalar)
 RESULT: write    SUBSET         1       256        64      286.4615089145     1059.8190875640      (with scalar)


On Mon, Aug 7, 2023 at 1:47 PM Wei-Keng Liao <wkliao at northwestern.edu<mailto:wkliao at northwestern.edu>> wrote:
Is that the reason for why cb_nodes is 1?
Strange, because cb_nodes is set at the file open time.

Entering the independent data mode in PnetCDF can be completely avoided
if using the nonblocking APIs.

I would suggest your codes to use the nonblocking APIs in the following way.

/* for non-partitioned variables */
if (rank == 0) {
    ncmpi_iput_var_int(fh, varid[0], data[0], &req[0]); /* write the whole variable */
    ncmpi_iput_var_int(fh, varid[1], data[1], &req[1]);
    ...
}
/* for partitioned variables */
ncmpi_iput_vara_int(fh, varid[j], data[j], starts[j], counts[j], &req[j]);
...


/* commit all posted nonblocking requests */
ncmpi_wait_all(ncid, NC_REQ_ALL, NC_REQ_NULL, NULL);


Wei-keng

> On Aug 7, 2023, at 2:12 PM, Jim Edwards <jedwards at ucar.edu<mailto:jedwards at ucar.edu>> wrote:
>
> Hi Wei-Keng,
>
> I think that I've found the problem.   In the model I am writing a number of scalar variables to the file as well as the decomposed variables.
> for the scalar variables I use a code structure like:
>
> ncmpi_begin_indep_data(fh);
> ncmpi_put_vars_int(fh, varid, start, count, stride, data);
> ncmpi_end_indep_data(fh);
>
> In my pioperf test code I didn't write any scalars - this morning I added one and the write performance for the decomposed variables got very very
> bad.  What can I do about it?
>
> Jim
>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO



--
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230809/e1049531/attachment-0001.html>


More information about the parallel-netcdf mailing list