performance issue

Jim Edwards jedwards at ucar.edu
Wed Aug 9 15:51:28 CDT 2023


I spent a little time trying to do this but gave up and went back to using
cray profiling tools to get more info.
One thing really stands out to me:

This is for the fast write:
dec1793.hsn.de.hpc.ucar.edu 0: | number of write gaps = 2
dec1793.hsn.de.hpc.ucar.edu 0: | ave write gap size = 9722924978
dec1793.hsn.de.hpc.ucar.edu 0:
--------------------------------------------------------
dec1793.hsn.de.hpc.ucar.edu 0: RESULT: write SUBSET 1 16 64 4060.0217755460
4.5714040530

And this is for the slow one:
dec1793.hsn.de.hpc.ucar.edu 0: | number of write gaps = 1020
dec1793.hsn.de.hpc.ucar.edu 0: | ave write gap size = 19079761
dec1793.hsn.de.hpc.ucar.edu 0:
--------------------------------------------------------
dec1793.hsn.de.hpc.ucar.edu 0: RESULT: write SUBSET 1 16 64 76.2558020443
243.3913158400


Do you understand?

On Tue, Aug 8, 2023 at 11:50 AM Wei-Keng Liao <wkliao at northwestern.edu>
wrote:

> I have revised the example program to add writes to scalar and record
> variables.
> Let me know if that works for you. URL again is below.
>
>
> https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c
>
> Wei-keng
>
> On Aug 7, 2023, at 6:10 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>
> That example doesn't include record variables.  Do you have a similar one
> with record vars?
>
>
>
> On Mon, Aug 7, 2023 at 4:32 PM Wei-Keng Liao <wkliao at northwestern.edu>
> wrote:
>
>> Hi, Jim
>>
>> To eliminate the overheads of PIO, I suggest to use this PnetCDF example
>> program
>> and add a scalar variable to see if the same happens.
>>
>>
>> https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c
>> <https://urldefense.com/v3/__https://github.com/Parallel-NetCDF/PnetCDF/blob/master/examples/C/nonblocking_write.c__;!!Dq0X2DkFhyF93HkjWTBQKhk!RGlLkVUbuYrrGrSkShv42nz4KqtPJK0FiNzPuYKV-esdwU5UcgKr0xLvQpOooAfY4n2UMB8meSG2ZanhcYgGU_Q$>
>>
>> Wei-keng
>>
>> On Aug 7, 2023, at 4:28 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>>
>> Hi Wei-Keng,
>>
>> The cb_nodes doesn't seem to be affected.
>>
>> Not using independent mode doesn't seem to have helped.  I have the
>> pioperf program now writing two files.  One with only
>> decomposed fields and one with one additional field, rundate, which is a
>> string with the date in it.
>>
>> The performance is drastically different:
>>                                                          IO tasks
>> vars      Mb/s                           Time (s)
>>  RESULT: write    SUBSET         1       256        64
>>  12067.7548254854       25.1577347560    (without scalar)
>>  RESULT: write    SUBSET         1       256        64
>>  286.4615089145     1059.8190875640      (with scalar)
>>
>>
>> On Mon, Aug 7, 2023 at 1:47 PM Wei-Keng Liao <wkliao at northwestern.edu>
>> wrote:
>>
>>> Is that the reason for why cb_nodes is 1?
>>> Strange, because cb_nodes is set at the file open time.
>>>
>>> Entering the independent data mode in PnetCDF can be completely avoided
>>> if using the nonblocking APIs.
>>>
>>> I would suggest your codes to use the nonblocking APIs in the following
>>> way.
>>>
>>> /* for non-partitioned variables */
>>> if (rank == 0) {
>>>     ncmpi_iput_var_int(fh, varid[0], data[0], &req[0]); /* write the
>>> whole variable */
>>>     ncmpi_iput_var_int(fh, varid[1], data[1], &req[1]);
>>>     ...
>>> }
>>> /* for partitioned variables */
>>> ncmpi_iput_vara_int(fh, varid[j], data[j], starts[j], counts[j],
>>> &req[j]);
>>> ...
>>>
>>>
>>> /* commit all posted nonblocking requests */
>>> ncmpi_wait_all(ncid, NC_REQ_ALL, NC_REQ_NULL, NULL);
>>>
>>>
>>> Wei-keng
>>>
>>> > On Aug 7, 2023, at 2:12 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>>> >
>>> > Hi Wei-Keng,
>>> >
>>> > I think that I've found the problem.   In the model I am writing a
>>> number of scalar variables to the file as well as the decomposed variables.
>>> > for the scalar variables I use a code structure like:
>>> >
>>> > ncmpi_begin_indep_data(fh);
>>> > ncmpi_put_vars_int(fh, varid, start, count, stride, data);
>>> > ncmpi_end_indep_data(fh);
>>> >
>>> > In my pioperf test code I didn't write any scalars - this morning I
>>> added one and the write performance for the decomposed variables got very
>>> very
>>> > bad.  What can I do about it?
>>> >
>>> > Jim
>>> >
>>> >
>>> > --
>>> > Jim Edwards
>>> >
>>> > CESM Software Engineer
>>> > National Center for Atmospheric Research
>>> > Boulder, CO
>>>
>>>
>>
>> --
>> Jim Edwards
>>
>> CESM Software Engineer
>> National Center for Atmospheric Research
>> Boulder, CO
>>
>>
>>
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
>
>
>

-- 
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20230809/edca8ef9/attachment.html>


More information about the parallel-netcdf mailing list