<div dir="ltr">Wei-keng,<div><br></div><div>Thanks for your input. This user is already using quilting. They are trying to write IO every 15 minutes of model time and don't want to overuse quilting nodes as they want to keep the smallest footprint possible for operational use.</div><div><br></div><div>I need more information from them but it seems that the slowness they are reporting does not happen on other systems. That is information I am still trying to track down. </div><div><br></div><div>I will give FLASH-IO a try.</div><div><br></div><div>Thanks,</div><div>Craig</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 12, 2016 at 8:39 PM, Wei-keng Liao <span dir="ltr"><<a href="mailto:wkliao@eecs.northwestern.edu" target="_blank">wkliao@eecs.northwestern.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi, Craig<br>
<br>
If Lustre is used, the first question I would ask is the file striping<br>
setting. Did you increase the file stripe count to a large value?<br>
<br>
My understanding of WRF 3.6 is that the write requests from individual processes<br>
are small and many (one for each variable). This pattern is not able to fully<br>
utilize the I/O bandwidth provided by the parallel file systems, such as Lustre.<br>
The "IO Quilting" option in WRF is designed to tackle this problem by shipping<br>
the requests to an additional, set-aside MPI processes, so they can be aggregated<br>
or "quilted" there into larger ones and thus achieving a better performance.<br>
<br>
PnetCDF nonblocking APIs can aggregate the requests without setting aside of<br>
additional MPI processes. PIO developed by the team lead by Jim Edwards makes<br>
use of this feature. I believe through PIO, WRF should be able to obtain a<br>
significant performance improvement.<br>
<br>
If your users would still like to stick with the older version of WRF, then<br>
I/O quilting is their best option. Due to the I/O pattern described above,<br>
no parallel file system could handle the pattern well.<br>
<br>
You can also use FLASH-IO benchmark which comes with the PnetCDF release.<br>
It is under folder of benchmarks/FLASH-IO. The benchmark writes 24 variables<br>
in parallel. The variable size is determined by parameters nxb, nyb, and nzb<br>
in file physicaldata.fh. You can manually change their values. See README<br>
for further info and an example run on Edison@NERSC, which shows a 4.1 GB/sec<br>
write bandwidth using Lustre.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
Wei-keng<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Feb 12, 2016, at 5:12 PM, Jim Edwards wrote:<br>
<br>
> Thanks for the feedback - good luck with your performance issue.<br>
><br>
> On Fri, Feb 12, 2016 at 4:01 PM, Craig Tierney - NOAA Affiliate <<a href="mailto:craig.tierney@noaa.gov">craig.tierney@noaa.gov</a>> wrote:<br>
> Jim,<br>
><br>
> My users aren't using the latest WRF. Also, we have had a bad experience trying to build PIO on NOAA systems. It has been a challenge and it isn't a direction these users want to go at this time.<br>
><br>
> Craig<br>
><br>
> On Fri, Feb 12, 2016 at 3:56 PM, Jim Edwards <<a href="mailto:jedwards@ucar.edu">jedwards@ucar.edu</a>> wrote:<br>
> Hi Craig,<br>
><br>
> In more recent version of wrf there is a pio option that should improve pnetcdf io performance. Also in the pio distribution is a performance tool that can measure io performance based on the data decomposition you are using in wrf. <a href="https://github.com/NCAR/ParallelIO" rel="noreferrer" target="_blank">https://github.com/NCAR/ParallelIO</a><br>
><br>
> On Fri, Feb 12, 2016 at 3:34 PM, Craig Tierney - NOAA Affiliate <<a href="mailto:craig.tierney@noaa.gov">craig.tierney@noaa.gov</a>> wrote:<br>
> Hello All,<br>
><br>
> I have a user complaining about poor IO performance from WRF when using pnetcdf 1.6.1. While I am waiting on real data from the user, I want to test the filesystem and see what it does to determine if it is WRF or something else. I found the list of benchmarks on the website, but there are many to choose from! Can someone recommend a single benchmark I should try?<br>
><br>
> I have tried the BTIO pnetcdf benchmark. What I see is that the Lustre ADIO is no faster than the NFS ADIO when using Intel 15.0.3 and Intel IMPI. I have set the variables that Intel MPI requires (I_MPI_EXTRA_FILESYSTEM and I_MPI_EXTRA_FILESYSTEM_LIST) and the benchmark is reporting that it is using Lustre. I am getting 60 MB/s whether I use 1, 4 or 16 cores. I would expect the results to be faster. I want to see if there is a better benchmark and if so how the result compares to BTIO.<br>
><br>
> Thanks,<br>
> Craig<br>
><br>
><br>
><br>
> --<br>
> Jim Edwards<br>
><br>
> CESM Software Engineer<br>
> National Center for Atmospheric Research<br>
> Boulder, CO<br>
><br>
><br>
><br>
><br>
> --<br>
> Jim Edwards<br>
><br>
> CESM Software Engineer<br>
> National Center for Atmospheric Research<br>
> Boulder, CO<br>
<br>
</div></div></blockquote></div><br></div>