[mpich2-dev] improved scalability with O_NOATIME flag

Dave Goodell goodell at mcs.anl.gov
Tue Dec 14 12:32:49 CST 2010


Careful, you shouldn't be defining __USE_GNU: http://gcc.gnu.org/ml/fortran/2005-10/msg00365.html

Instead one of the approved "feature test" macros should be defined, probably _GNU_SOURCE: http://www.gnu.org/s/libc/manual/html_node/Feature-Test-Macros.html

This sort of thing is a bit tricky because it doesn't play nicely with our autoconf setup in all cases.  If we don't set _GNU_SOURCE for most/all of our configure tests and only define it in some parts of ROMIO, the results of our configure tests won't necessarily be accurate when compiling those parts.  OTOH, it's heavy handed to always define that macro at the top of configure, and it will take some work to prevent it from interfering with "--enable-strict=whatever".

Scanning the current codebase, it looks like we've opted for the first approach in a couple of non-critical places already (mpid_nem_lmt_vmsplice, ad_xfs).  I'd probably go with a configure test that assumed this approach, along with a configure option to disable it in the wild if it turns out to be a problem on some systems.

-Dave

On Dec 14, 2010, at 12:01 PM CST, Rob Latham wrote:

> Came across an interesting optimization that should be easy to add to
> ROMIO, but I want to do so in a portable way.
> 
> the O_NOATIME flag does what it says on the tin: don't update atime
> when you open this file.  Now O_RDONLY really does only read -- no
> metadata update needed.
> 
> In a collective open we can set this flag on all processors save one,
> and presumably avoid a metadata storm (think lustre and it's single
> metadata server).
> 
> So, what's the "MPICH way" to make use of a gnu-libc flag?  On my
> laptop it's protected by an "#ifdef __USE_GNU".  Is it ok
> to write a configure-time check for O_NOATIME that defines __USE_GNU?
> But then I have to set __USE_GNU inside adio/common/ad_open.c if
> we HAVE_O_NOATIME .
> 
> ==rob
> 
> ----- Forwarded message from Mark Howison <mark.howison at gmail.com> -----
> 
> Sender: hdf-forum-bounces at hdfgroup.org
> From: Mark Howison <mark.howison at gmail.com>
> Reply-To: HDF Users Discussion List <hdf-forum at hdfgroup.org>
> Subject: Re: [Hdf-forum] round-robin (not parallel) access to single hdf5
> 	file
> Date: Tue, 14 Dec 2010 12:03:05 -0500
> Message-ID: <AANLkTin92tPWF1r3XhvaCxNB1sMpkQxtt0qx1FLaE==u at mail.gmail.com>
> To: HDF Users Discussion List <hdf-forum at hdfgroup.org>
> X-Spam-Status: No, score=-2.1
> 
> On Tue, Dec 14, 2010 at 8:08 AM, Quincey Koziol <koziol at hdfgroup.org> wrote:
>>> If not, there is another optimization that I think was reported in a
>>> paper on PLFS or Adios about passing a flag to the fopen call on each
>>> MPI task that tells it not to update the creation/modification time
>>> except on the root task. This can greatly reduce the load on the
>>> metadata server for a parallel file system.
>> 
>>        Interesting, can you send me a reference for this?
> 
> I'm pretty sure the trick was to use O_NOATIME in the open() call,
> except on task 0. (You can find this on NICS webpage on I/O best
> practices.)
> 
> I know I came across it in the lit review for our HDF5/Lustre paper,
> but I can't put my fingers on the paper. It might haveI vaguely recall
> a scaling graph showing how this outperformed a regular open() call,
> and I think the text was 1-column wide... I'll keep looking through my
> woefully unorganized pile of PDFs.
> 
> Mark
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> ----- End forwarded message -----
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA



More information about the mpich2-dev mailing list