Performance when reading many small variables

Wei-keng Liao wkliao at eecs.northwestern.edu
Tue Dec 1 01:06:45 CST 2015


Hi, Michael

You can use PnetCDF nonblocking APIs to read. The code fragment that uses
nonblocking reads is shown the followings.

    int reqs[2000], statuses[2000];

    err = ncmpi_open(MPI_COMM_WORLD, filename, omode, MPI_INFO_NULL, &ncid);
    for (i=0; i<2000; i++)
        err = ncmpi_iget_vara_int(ncid, varid[i], start, count, &buf[i], &reqs[i]);

    err = ncmpi_wait_all(ncid, 2000, reqs, statuses);


If there is only one entry per variable, then you can use var APIs and skip
the arguments start and count. Such as

    for (i=0; i<2000; i++)
        err = ncmpi_iget_var_int(ncid, varid[i], &buf[i], &reqs[i]);



PnetCDF nonblocking APIs defer the requests till ncmpi_wait_all where all
requests are aggregated into one big, single MPI I/O call. There are many example
programs (in C and Fortran) available in all PnetCDF releases, under examples
directory. http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples

In addition, I suggest to open the input file using MPI_COMM_WORLD, so the
program can take advantage of MPI collective I/O for better performance, even if
all processes read the same data.

If your input file is generated from a PnetCDF program, then I suggest to disable
file offset alignment for the fixed-size (non-record) variables, given there is
only one entry per variable. To disable alignment, you can use an MPI info object
and set nc_var_align_size to 1 and pass the info object to ncmpi_create call.
Or you can set the same hint at run time. Please see
https://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/HintsForPnetcdf
and
http://cucis.ece.northwestern.edu/projects/PnetCDF/doc/pnetcdf-c/PNETCDF_005fHINTS.html

For further information, please check Q&A in
http://cucis.ece.northwestern.edu/projects/PnetCDF/faq.html
and
http://cucis.ece.northwestern.edu/projects/PnetCDF

Wei-keng

On Nov 30, 2015, at 11:53 PM, Schlottke-Lakemper, Michael wrote:

> Dear all,
> 
> We recently converted all of our code to use the Parallel netCDF library instead of the NetCDF library (before we had a mix), also using Pnetcdf for non-parallel file access. We did not have any issues whatsoever, until one user notified us of a performance regression in a particular case.
> 
> He is trying to read many (O(2000)) variables from a single file in a loop, each variable with just one entry. Since this is very old code and usually only few variables are concerned, each process reads the same data individually. Before, the NetCDF library was used for this task, and during refactoring it was replaced by Pnetcdf with MPI_COMM_SELF. When using the code on a moderate number of MPI ranks (~500), the user noticed a severe performance degradation since switching to Pnetcdf:
> 
> Before, the read process of the 2000 Variables cumulatively amounted to ~0.6s. After switching to Pnetcdf (using ncmpi_get_vara_int_all), this number increased to ~300s. Going from MPI_COMM_SELF to MPI_COMM_WORLD reduced this number to ~30s, which is still high in comparison.
> 
> What, if anything, can we do to get similar performance when using Pnetcdf in this particular case? I know this is a rather degenerate case and that one possible fix would be to change the layout to 1 Variable with 2000 entries, but I was hoping that someone here has a suggestion what we could try anyways.
> 
> Thanks a lot in advance
> 
> Michael
> 
> 
> --
> Michael Schlottke-Lakemper
> 
> SimLab Highly Scalable Fluids & Solids Engineering
> Jülich Aachen Research Alliance (JARA-HPC)
> RWTH Aachen University
> Wüllnerstraße 5a
> 52062 Aachen
> Germany
> 
> Phone: +49 (241) 80 95188
> Fax: +49 (241) 80 92257
> Mail: m.schlottke-lakemper at aia.rwth-aachen.de
> Web: http://www.jara.org/jara-hpc
> 



More information about the parallel-netcdf mailing list