Hints on improving performance with WRF and Pnetcdf

Wei-keng Liao wkliao at ece.northwestern.edu
Mon Sep 6 11:36:45 CDT 2010


Gerry,

I ran a 1024-PE job yesterday on Ranger using 32 stripe count without a problem.
Lustre should not have any problem simply because of the use of a large
stripe count. Do you use pnetcdf independent APIs in your program?
If you are using collective APIs only, do you access variables partially
(i.e. subarrays) or always entire variables? A large number of
noncontiguous file accesses may flood the I/O servers and slow down the I/O
performance, but that still should not shut down the Lustre. Maybe Ranger's
root have a better answer on this.

Wei-keng

On Sep 6, 2010, at 8:47 AM, Gerry Creager wrote:

> Wei-keng
> 
> Thanks. Useful information. I'll look at your ROMIO library later today (about to go into a meeting for the rest of the morning).  Last time I set stripe-count to homething above 16, rsl files were also "taking advantage" of that and shut down the LFS. Have you seen this or do you address this in ROMIO?
> 
> gerry
> 
> Wei-keng Liao wrote:
>> Hi, Gerry and Craig,
>> I would like to provide my experience on Ranger.
>> First, I agree with Rob that the most recent optimizations for Lustre
>> ADIO driver might not be yet installed on Ranger. Because in my
>> experiments on Ranger, the MPI collective write performance is poor.
>> I have built a ROMIO library with the recent optimizations for Lustre
>> in my home directory and you are welcomed to give it a try. Below is
>> the usage example of the library:
>> %  mpif90 myprogram.o -L/share/home/00531/tg457823/ROMIO/lib -lmpio
>> Please note that this library is built using mvapich2 on Ranger. Run the
>> command below before compile/link your programs.
>> %  module load mvapich2
>> I usually set the Lustre striping configuration for the output directory
>> before I ran applications. I use 1MB stripe size, stripe counts 32,
>> 64 or 128, and the stripe offset -1. Since by Lustre default all files
>> created under a directory inherit the same striping configuration of
>> that directory and my ROMIO built detects these striping configurations
>> automatically, there is no need for me to set ROMIO hints in my programs.
>> You can verify the striping configuration of a newly created file by
>> this command, for example:
>> % lfs getstripe -v /scratch/00531/tg457823/FS_1M_32/testfile.dat  | grep stripe
>>  lmm_stripe_count:   32
>>  lmm_stripe_size:    1048576
>>  lmm_stripe_pattern: 1
>> If you used pnetcdf collective I/O, I recommend to give my ROMIO library a try.
>> Wei-keng
>> On Sep 5, 2010, at 10:28 AM, Craig Tierney wrote:
>>> On 9/4/10 8:25 PM, Gerry Creager wrote:
>>>> Rob Latham wrote:
>>>>> On Thu, Sep 02, 2010 at 06:23:42PM -0600, Craig Tierney wrote:
>>>>>> I did try setting the hints myself by changing the code, and performance
>>>>>> still stinks (or is no faster). I was just looking for a way to not
>>>>>> have to modify WRF, or more importantly have every user modify WRF.
>>>>> What's going slowly?
>>>>> If wrf is slowly writing record variables, you might want to try
>>>>> disabling collective I/O or carefully selecting the intermediate
>>>>> buffer to be as big as one record.
>>>>> 
>>>>> That's the first place I'd look for bad performance.
>>>> Ah, but I'm seeing the same thing on Ranger (UTexas). I'm likely going
>>>> to have to modify the WRF pnetcdf code to identify a sufficiently large
>>>> stripe count (Lustre file system) to see any sort of real improvement.
>>>> 
>>>> More to the point, I see worse performance than with normal Lustre and
>>>> regular netcdf. AND, there's no way to set MPI-IO-HINTS in the SGE as
>>>> configured on Ranger. We've tried and their systems folk concur, so it's
>>>> not just me saying it.
>>>> 
>>> What do you mean you can't?  How would you set it in another batch system?
>>> 
>>>> I will look at setting the hints file up but I don't think that's going
>>>> to give me the equivalent of 64 stripe counts, which looks like the
>>>> sweet spot for the domain I'm testing on.
>>>> 
>>> So what Hints are you passing and is then the key to increase the number
>>> of stripes for the directory?
>>> 
>>>> Craig, one I have time to get back on to this, I think we can convince
>>>> NCAR to add this as a bug release. I also anticipate the tweak will be
>>>> on the order of 4-5 lines.
>>>> 
>>> I already wrote code so that if you set the variable WRF_MPIIO_HINTS, and list all the hints you want to set (comma delimited), then the code in external/io_pnetcdf/wrf_IO.F90 will set the hints for you.  When
>>> I see that any of this actually helps I will send the patch in for future use.
>>> 
>>> Craig
>>> 
> 
> 
> -- 
> Gerry Creager -- gerry.creager at tamu.edu
> Texas Mesonet -- AATLT, Texas A&M University
> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the parallel-netcdf mailing list