filename prefixes

Gerald Creager gerry.creager at tamu.edu
Wed Aug 11 18:47:04 CDT 2010


On the system I'm working with, I can't use the MPICH envVars such as:
MPICH_MPIIO_HINTS_DISPLAY 1
MPICH_MPIIO_HINTS “wrfout*:striping_factor=64”

Therefore, to set striping on the wrfout files, with a Lustre file 
system and SGE for the batch queuing environment, I've gotta find where 
the wrfout file creation instance occurs and add a couple lines of code 
to make it create the wrfouts with stripe-counts appropriately set 
(somewhere between 16-64, I think). What I intend to do eventually, is 
to get that folded back into WRF as a namelist parameter, so that those 
of us using pnetcdf (needed if proc count gets past ~512 or so on this 
system) can have a simplified granular method of using striping on 
parallel file systems (specifically with pnetcdf).

I've looked at Johnsen's work to use Lustre on the Cray XT5. It dowsn't 
apply to my environment, more's the pity.


Thanks, Gerry

Don Morton wrote:
> I've used pnetcdf with WRF, using the nocolons option.  I'm not sure 
> specifically what you're asking now, but I can send you my notes if it 
> helps...
> 
> On Wed, Aug 11, 2010 at 3:13 PM, Gerald Creager <gerry.creager at tamu.edu 
> <mailto:gerry.creager at tamu.edu>> wrote:
> 
>     It's a namelist.input spec: NOCOLONS
> 
>     I'm sorting thru some other issues with pnetcdf and WRF right now...
>     I'm having to change it so it'll create wrfout_dxx files with the
>     striping info correct at file creation. If anyone's had to do this,
>     I'd appreciate a clue...
> 
>     gerry
> 
>     Jim Edwards wrote:
> 
>         Hi Johnny,
> 
>         I think that the real problem may be that WRF uses the colon
>         character in filenames and the filesystem reserves this same
>         character for special use.   I think that there is a compile
>         option for wrf not to use colons.
> 
>         Jim
> 
>         On Wed, Aug 11, 2010 at 4:44 PM, Johnny Chang
>         <Johnny.Chang at nasa.gov <mailto:Johnny.Chang at nasa.gov>
>         <mailto:Johnny.Chang at nasa.gov <mailto:Johnny.Chang at nasa.gov>>>
>         wrote:
> 
>            Hello,
> 
>            I am helping a user trouble-shoot a runtime error using
>            parallel-netcdf version 1.1.1 and mvapich2/1.2p1/intel-PIC.
> 
>            The error message is:
> 
>             0: MPI_File_open : File does not exist, error stack:
>            ADIO_RESOLVEFILETYPE_PREFIX(546): Invalid file name
>            wrfout_d01_2006-07-25_00:00:00
>             open_hist_w : error opening wrfout_d01_2006-07-25_00:00:00 for
>            writing. ***
> 
>            While googling the ADIO_RESOLVEFILETYPE_PREFIX error, we
>         found the
>            ad_fstype.c code containing:
> 
>            477     /*
>            478       ADIO_FileSysType_prefix - determines file system
>         type for
>            a file using
>            479       a prefix on the file name.  upper layer should have
>            already determined
>            480       that a prefix is present.
>            481        482     Input Parameters:
>            483     . filename - path to file, including prefix (xxx:)
>            484        485     Output Parameters:
>            486     . fstype - pointer to integer in which to store file
>         system
>            type (ADIO_XXX)
>            487     . error_code - pointer to integer in which to store
>         error code
>            488        489       Returns MPI_SUCCESS in error_code on
>         success.  Filename
>            not having a prefix
>            490       is considered an error. Except for on Windows systems
>            where the default is NTFS.
>            491        492      */
>            493     static void ADIO_FileSysType_prefix(char *filename, int
>            *fstype, int *error_code)
>            494     {
>            495         static char myname[] = "ADIO_RESOLVEFILETYPE_PREFIX";
>            496         *error_code = MPI_SUCCESS;
>            497        498         if (!strncmp(filename, "pfs:", 4) ||
>         !strncmp(filename,
>            "PFS:", 4)) {
>            499             *fstype = ADIO_PFS;
>            500         }
> 
>                    ...
> 
> 
>            557     #else
>            558             *fstype = 0;
>            559             /* --BEGIN ERROR HANDLING-- */
>            560             *error_code = MPIO_Err_create_code(MPI_SUCCESS,
>            MPIR_ERR_RECOVERABLE,
>            561                                                myname,
>         __LINE__,
>            MPI_ERR_NO_SUCH_FILE,
>            562                                                "**filename",
>            "**filename %s", filename);
>            563             /* --END ERROR HANDLING-- */
>            564     #endif
>            565         }
>            566     }
> 
>            which seems to indicate that the MVAPICH2 library is expecting
>            parallel-netcdf
>            to pre-pend a prefix on the filename passed to the MVAPICH2
>         library.
> 
>            We are running on a Lustre filesystem.  So, we think that the
>            parallel-netcdf
>            library should have passed the "lustre:" or "LUSTRE:" prefix
>         along
>            with the
>            actual filename.  Are we right in this interpretation of the
>         error?
> 
>            If so, then perhaps the parallel-netcdf library was not built
>         correctly?
> 
>            Here is the beginning part of config.log:
> 
>          
>          ------------------------------------------------------------------------
> 
>            This file contains any messages produced by compilers while
>            running configure, to aid debugging if configure makes a mistake.
> 
>            It was created by configure, which was
>            generated by GNU Autoconf 2.61.  Invocation command line was
> 
>             $ ./configure --prefix=/nasa/parallel-netcdf/1.1.1/mvapich2
>            --with-mpi=/nasa/mvapich2/1.2p1/intel-PIC
> 
>            ## --------- ##
>            ## Platform. ##
>            ## --------- ##
> 
>            hostname = pbspl1
>            uname -m = x86_64
>            uname -r = 2.6.16.60-0.42.5.03schamp-nasa
>            uname -s = Linux
>            uname -v = #1 SMP Tue Nov 10 20:46:20 UTC 2009
> 
>            /usr/bin/uname -p = unknown
>            /bin/uname -X     = unknown
> 
>            /bin/arch              = x86_64
>            /usr/bin/arch -k       = unknown
>            /usr/convex/getsysinfo = unknown
>            /usr/bin/hostinfo      = unknown
>            /bin/machine           = unknown
>            /usr/bin/oslevel       = unknown
>            /bin/universe          = unknown
> 
>            PATH: /nasa/intel/Compiler/11.1/046/bin/intel64
>            PATH: /nasa/intel/Compiler/11.1/046/mkl/tools/environment
>            PATH: /nasa/mvapich2/1.2p1/intel-PIC/bin
>            PATH: /u/jrappley/bin
> 
>            If the problem is in the parallel-netcdf build, let us know
>            what is the fix.
> 
>            Thanks in advance!
> 
>            Johnny
>            --     Johnny Chang
>            650-604-4356
> 
> 
> 
>     -- 
>     Gerry Creager -- gerry.creager at tamu.edu <mailto:gerry.creager at tamu.edu>
>     Texas Mesonet -- AATLT, Texas A&M University
>     Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
>     Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
> 
> 
> 
> 
> -- 
> Arctic Region Supercomputing Center
> http://weather.arsc.edu/

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


More information about the parallel-netcdf mailing list