Looking for good benchmark to test filesystem performance

Wei-keng Liao wkliao at eecs.northwestern.edu
Fri Feb 12 21:39:24 CST 2016


Hi, Craig

If Lustre is used, the first question I would ask is the file striping
setting. Did you increase the file stripe count to a large value?

My understanding of WRF 3.6 is that the write requests from individual processes
are small and many (one for each variable). This pattern is not able to fully
utilize the I/O bandwidth provided by the parallel file systems, such as Lustre.
The "IO Quilting" option in WRF is designed to tackle this problem by shipping
the requests to an additional, set-aside MPI processes, so they can be aggregated
or "quilted" there into larger ones and thus achieving a better performance.

PnetCDF nonblocking APIs can aggregate the requests without setting aside of
additional MPI processes. PIO developed by the team lead by Jim Edwards makes
use of this feature. I believe through PIO, WRF should be able to obtain a
significant performance improvement.

If your users would still like to stick with the older version of WRF, then
I/O quilting is their best option. Due to the I/O pattern described above,
no parallel file system could handle the pattern well.

You can also use FLASH-IO benchmark which comes with the PnetCDF release.
It is under folder of benchmarks/FLASH-IO. The benchmark writes 24 variables
in parallel. The variable size is determined by parameters nxb, nyb, and nzb
in file physicaldata.fh. You can manually change their values. See README
for further info and an example run on Edison at NERSC, which shows a 4.1 GB/sec
write bandwidth using Lustre.


Wei-keng

On Feb 12, 2016, at 5:12 PM, Jim Edwards wrote:

> Thanks for the feedback - good luck with your performance issue.
> 
> On Fri, Feb 12, 2016 at 4:01 PM, Craig Tierney - NOAA Affiliate <craig.tierney at noaa.gov> wrote:
> Jim,
> 
> My users aren't using the latest WRF.  Also, we have had a bad experience trying to build PIO on NOAA systems.  It has been a challenge and it isn't a direction these users want to go at this time. 
> 
> Craig
> 
> On Fri, Feb 12, 2016 at 3:56 PM, Jim Edwards <jedwards at ucar.edu> wrote:
> Hi Craig,
> 
> In more recent version of wrf there is a pio option that should improve pnetcdf io performance.  Also in the pio distribution is a performance tool that can measure io performance based on the data decomposition you are using in wrf.  https://github.com/NCAR/ParallelIO
> 
> On Fri, Feb 12, 2016 at 3:34 PM, Craig Tierney - NOAA Affiliate <craig.tierney at noaa.gov> wrote:
> Hello All,
> 
> I have a user complaining about poor IO performance from WRF when using pnetcdf 1.6.1.  While I am waiting on real data from the user, I want to test the filesystem and see what it does to determine if it is WRF or something else.  I found the list of benchmarks on the website, but there are many to choose from!  Can someone recommend a single benchmark I should try?
> 
> I have tried the BTIO pnetcdf benchmark.  What I see is that the Lustre ADIO is no faster than the NFS ADIO when using Intel 15.0.3 and Intel IMPI.  I have set the variables that Intel MPI requires (I_MPI_EXTRA_FILESYSTEM and I_MPI_EXTRA_FILESYSTEM_LIST) and the benchmark is reporting that it is using Lustre.  I am getting 60 MB/s whether I use 1, 4 or 16 cores.   I would expect the results to be faster.  I want to see if there is a better benchmark and if so how the result compares to BTIO.
> 
> Thanks,
> Craig
> 
> 
> 
> -- 
> Jim Edwards
> 
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO 
> 
> 
> 
> 
> -- 
> Jim Edwards
> 
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO 



More information about the parallel-netcdf mailing list