Hints on improving performance with WRF and Pnetcdf

Gerry Creager gerry.creager at tamu.edu
Mon Sep 6 05:55:08 CDT 2010


Craig Tierney wrote:
> On 9/4/10 8:25 PM, Gerry Creager wrote:
>> Rob Latham wrote:
>>> On Thu, Sep 02, 2010 at 06:23:42PM -0600, Craig Tierney wrote:
>>>> I did try setting the hints myself by changing the code, and 
>>>> performance
>>>> still stinks (or is no faster). I was just looking for a way to not
>>>> have to modify WRF, or more importantly have every user modify WRF.
>>>
>>> What's going slowly?
>>> If wrf is slowly writing record variables, you might want to try
>>> disabling collective I/O or carefully selecting the intermediate
>>> buffer to be as big as one record.
>>>
>>> That's the first place I'd look for bad performance.
>>
>> Ah, but I'm seeing the same thing on Ranger (UTexas). I'm likely going
>> to have to modify the WRF pnetcdf code to identify a sufficiently large
>> stripe count (Lustre file system) to see any sort of real improvement.
>>
>> More to the point, I see worse performance than with normal Lustre and
>> regular netcdf. AND, there's no way to set MPI-IO-HINTS in the SGE as
>> configured on Ranger. We've tried and their systems folk concur, so it's
>> not just me saying it.
>>
> 
> What do you mean you can't?  How would you set it in another batch system?

Pretty much that. In SGE as installed at TACC, it doesn't pass anything. 
That's not to say it won't work with SGE, but not with SGE as installed 
at TACC.

>> I will look at setting the hints file up but I don't think that's going
>> to give me the equivalent of 64 stripe counts, which looks like the
>> sweet spot for the domain I'm testing on.
>>
> 
> So what Hints are you passing and is then the key to increase the number
> of stripes for the directory?

The key is stripe-count. BUT only for the wrfout files. I've tried 
changing the stripe-count on the directory, and that did improve 
performance transiently... until they killed my job and rebooted Ranger 
because the rsl.* files were ALSO being written with stripe-count=64, 
which had crashed their Lustre file system. Unintended Consequences has 
not been repealed.

>> Craig, one I have time to get back on to this, I think we can convince
>> NCAR to add this as a bug release. I also anticipate the tweak will be
>> on the order of 4-5 lines.
>>
> 
> I already wrote code so that if you set the variable WRF_MPIIO_HINTS, 
> and list all the hints you want to set (comma delimited), then the code 
> in external/io_pnetcdf/wrf_IO.F90 will set the hints for you.  When
> I see that any of this actually helps I will send the patch in for 
> future use.

Care to share?

Thanks, gerry
-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


More information about the parallel-netcdf mailing list