Hints on improving performance with WRF and Pnetcdf

Wei-keng Liao w-liao2 at northwestern.edu
Mon Sep 6 00:10:48 CDT 2010


Hi, Gerry and Craig,

I would like to provide my experience on Ranger.

First, I agree with Rob that the most recent optimizations for Lustre
ADIO driver might not be yet installed on Ranger. Because in my
experiments on Ranger, the MPI collective write performance is poor.

I have built a ROMIO library with the recent optimizations for Lustre
in my home directory and you are welcomed to give it a try. Below is
the usage example of the library:
%  mpif90 myprogram.o -L/share/home/00531/tg457823/ROMIO/lib -lmpio

Please note that this library is built using mvapich2 on Ranger. Run the
command below before compile/link your programs.
%  module load mvapich2

I usually set the Lustre striping configuration for the output directory
before I ran applications. I use 1MB stripe size, stripe counts 32,
64 or 128, and the stripe offset -1. Since by Lustre default all files
created under a directory inherit the same striping configuration of
that directory and my ROMIO built detects these striping configurations
automatically, there is no need for me to set ROMIO hints in my programs.
You can verify the striping configuration of a newly created file by
this command, for example:

% lfs getstripe -v /scratch/00531/tg457823/FS_1M_32/testfile.dat  | grep stripe

  lmm_stripe_count:   32
  lmm_stripe_size:    1048576
  lmm_stripe_pattern: 1

If you used pnetcdf collective I/O, I recommend to give my ROMIO library a try.
 
Wei-keng


On Sep 5, 2010, at 10:28 AM, Craig Tierney wrote:

> On 9/4/10 8:25 PM, Gerry Creager wrote:
>> Rob Latham wrote:
>>> On Thu, Sep 02, 2010 at 06:23:42PM -0600, Craig Tierney wrote:
>>>> I did try setting the hints myself by changing the code, and performance
>>>> still stinks (or is no faster). I was just looking for a way to not
>>>> have to modify WRF, or more importantly have every user modify WRF.
>>> 
>>> What's going slowly?
>>> If wrf is slowly writing record variables, you might want to try
>>> disabling collective I/O or carefully selecting the intermediate
>>> buffer to be as big as one record.
>>> 
>>> That's the first place I'd look for bad performance.
>> 
>> Ah, but I'm seeing the same thing on Ranger (UTexas). I'm likely going
>> to have to modify the WRF pnetcdf code to identify a sufficiently large
>> stripe count (Lustre file system) to see any sort of real improvement.
>> 
>> More to the point, I see worse performance than with normal Lustre and
>> regular netcdf. AND, there's no way to set MPI-IO-HINTS in the SGE as
>> configured on Ranger. We've tried and their systems folk concur, so it's
>> not just me saying it.
>> 
> 
> What do you mean you can't?  How would you set it in another batch system?
> 
>> I will look at setting the hints file up but I don't think that's going
>> to give me the equivalent of 64 stripe counts, which looks like the
>> sweet spot for the domain I'm testing on.
>> 
> 
> So what Hints are you passing and is then the key to increase the number
> of stripes for the directory?
> 
>> Craig, one I have time to get back on to this, I think we can convince
>> NCAR to add this as a bug release. I also anticipate the tweak will be
>> on the order of 4-5 lines.
>> 
> 
> I already wrote code so that if you set the variable WRF_MPIIO_HINTS, and list all the hints you want to set (comma delimited), then the code in external/io_pnetcdf/wrf_IO.F90 will set the hints for you.  When
> I see that any of this actually helps I will send the patch in for future use.
> 
> Craig
> 



More information about the parallel-netcdf mailing list