Hints on improving performance with WRF and Pnetcdf
Craig Tierney
Craig.Tierney at noaa.gov
Tue Sep 7 12:37:53 CDT 2010
On 9/6/10 10:36 AM, Wei-keng Liao wrote:
> Gerry,
>
> I ran a 1024-PE job yesterday on Ranger using 32 stripe count without a problem.
> Lustre should not have any problem simply because of the use of a large
> stripe count. Do you use pnetcdf independent APIs in your program?
> If you are using collective APIs only, do you access variables partially
> (i.e. subarrays) or always entire variables? A large number of
> noncontiguous file accesses may flood the I/O servers and slow down the I/O
> performance, but that still should not shut down the Lustre. Maybe Ranger's
> root have a better answer on this.
>
> Wei-keng
>
Wei-keng,
Can you characterize how much after your IO is going when you are using
pNetcdf? I am now trying a stripe size of 16, and I am not seeing any
improvement.
Craig
> On Sep 6, 2010, at 8:47 AM, Gerry Creager wrote:
>
>> Wei-keng
>>
>> Thanks. Useful information. I'll look at your ROMIO library later today (about to go into a meeting for the rest of the morning). Last time I set stripe-count to homething above 16, rsl files were also "taking advantage" of that and shut down the LFS. Have you seen this or do you address this in ROMIO?
>>
>> gerry
>>
>> Wei-keng Liao wrote:
>>> Hi, Gerry and Craig,
>>> I would like to provide my experience on Ranger.
>>> First, I agree with Rob that the most recent optimizations for Lustre
>>> ADIO driver might not be yet installed on Ranger. Because in my
>>> experiments on Ranger, the MPI collective write performance is poor.
>>> I have built a ROMIO library with the recent optimizations for Lustre
>>> in my home directory and you are welcomed to give it a try. Below is
>>> the usage example of the library:
>>> % mpif90 myprogram.o -L/share/home/00531/tg457823/ROMIO/lib -lmpio
>>> Please note that this library is built using mvapich2 on Ranger. Run the
>>> command below before compile/link your programs.
>>> % module load mvapich2
>>> I usually set the Lustre striping configuration for the output directory
>>> before I ran applications. I use 1MB stripe size, stripe counts 32,
>>> 64 or 128, and the stripe offset -1. Since by Lustre default all files
>>> created under a directory inherit the same striping configuration of
>>> that directory and my ROMIO built detects these striping configurations
>>> automatically, there is no need for me to set ROMIO hints in my programs.
>>> You can verify the striping configuration of a newly created file by
>>> this command, for example:
>>> % lfs getstripe -v /scratch/00531/tg457823/FS_1M_32/testfile.dat | grep stripe
>>> lmm_stripe_count: 32
>>> lmm_stripe_size: 1048576
>>> lmm_stripe_pattern: 1
>>> If you used pnetcdf collective I/O, I recommend to give my ROMIO library a try.
>>> Wei-keng
>>> On Sep 5, 2010, at 10:28 AM, Craig Tierney wrote:
>>>> On 9/4/10 8:25 PM, Gerry Creager wrote:
>>>>> Rob Latham wrote:
>>>>>> On Thu, Sep 02, 2010 at 06:23:42PM -0600, Craig Tierney wrote:
>>>>>>> I did try setting the hints myself by changing the code, and performance
>>>>>>> still stinks (or is no faster). I was just looking for a way to not
>>>>>>> have to modify WRF, or more importantly have every user modify WRF.
>>>>>> What's going slowly?
>>>>>> If wrf is slowly writing record variables, you might want to try
>>>>>> disabling collective I/O or carefully selecting the intermediate
>>>>>> buffer to be as big as one record.
>>>>>>
>>>>>> That's the first place I'd look for bad performance.
>>>>> Ah, but I'm seeing the same thing on Ranger (UTexas). I'm likely going
>>>>> to have to modify the WRF pnetcdf code to identify a sufficiently large
>>>>> stripe count (Lustre file system) to see any sort of real improvement.
>>>>>
>>>>> More to the point, I see worse performance than with normal Lustre and
>>>>> regular netcdf. AND, there's no way to set MPI-IO-HINTS in the SGE as
>>>>> configured on Ranger. We've tried and their systems folk concur, so it's
>>>>> not just me saying it.
>>>>>
>>>> What do you mean you can't? How would you set it in another batch system?
>>>>
>>>>> I will look at setting the hints file up but I don't think that's going
>>>>> to give me the equivalent of 64 stripe counts, which looks like the
>>>>> sweet spot for the domain I'm testing on.
>>>>>
>>>> So what Hints are you passing and is then the key to increase the number
>>>> of stripes for the directory?
>>>>
>>>>> Craig, one I have time to get back on to this, I think we can convince
>>>>> NCAR to add this as a bug release. I also anticipate the tweak will be
>>>>> on the order of 4-5 lines.
>>>>>
>>>> I already wrote code so that if you set the variable WRF_MPIIO_HINTS, and list all the hints you want to set (comma delimited), then the code in external/io_pnetcdf/wrf_IO.F90 will set the hints for you. When
>>>> I see that any of this actually helps I will send the patch in for future use.
>>>>
>>>> Craig
>>>>
>>
>>
>> --
>> Gerry Creager -- gerry.creager at tamu.edu
>> Texas Mesonet -- AATLT, Texas A&M University
>> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
>> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
>
More information about the parallel-netcdf
mailing list