File locking failed in ADIOI_Set_lock

Jim Edwards jedwards at ucar.edu
Fri Sep 24 11:02:15 CDT 2010


Hi Wei,

Collective IO is the default for PIO.

Jim

On Fri, Sep 24, 2010 at 9:53 AM, Wei-keng Liao
<wkliao at ece.northwestern.edu>wrote:

> Hi, John,
>
> Rob is right, turning off data sieving just avoids the error messages and
> may
> significantly slow down the performance (if your I/O is non-contiguous and
> only calls non-collective.)
>
> On Lustre, you need collective I/O to get high I/O bandwidths. Even if your
> write request from each process is already large and contiguous,
> non-collective
> write will still give you poor results. This is not the case on other file
> systems, e.g. GPFS.
>
> Is there an option to turn on collective I/O in WRF and PIO?
>
> If non-collective I/O is the only option (due to the irregular data
> distribution), then non-blocking I/O is another solution. In
> pnetcdf 1.2.0, non-blocking I/O can aggregate multiple
> non-collective requests into a single collective one. However, this
> approach requires changes to the pnetcdf calls in the I/O library
> used by WRF and PIO. The changes should be very simple, though.
> In general, I would suggest Lustre users to seek all opportunity to call
> collective I/O.
>
> Wei-keng
>
> On Sep 24, 2010, at 9:44 AM, Rob Latham wrote:
>
> > On Fri, Sep 24, 2010 at 07:49:06AM -0600, Mark Taylor wrote:
> >> Hi John,
> >>
> >> I've had a very similar issue a while ago on several older Lustre
> >> filesystems at Sandia, and I can confirm that setting those hints did
> >> allow the code to run
> >
> > if you turn off data sieving then there will be no more lock calls.
> > Depending on how your application partitions the arrays, that could be
> > fine, or it could result in a billion 8 byte operations.
> >
> >> (but I could never get pnetcdf to be any faster
> >> than netcdf).
> >
> > Unsurprising, honestly.  If you are dealing with Lustre, then you must
> > both use an updated ROMIO and use collective I/O.
> >
> > Here is the current list of MPI-IO implementations that work well with
> > Lustre:
> >
> > - Cray MPT 3.2 or newer
> > - MPICH2-1.3.0a1 or newer
> > - and that's it.
> >
> > I think the OpenMPI community is working on a re-sync with MPICH2
> > romio.  I also think we can stitch together a patch against OpenMPI if
> > you really need the improved lustre driver.   I'm not really in
> > patch-generating mode right now, but maybe once i'm back in the office
> > I can see how tricky it will be.
> >
> >> This was with CAM, with pnetcdf being called by PIO, and
> >> PIO has a compiler option to turn this on, -DPIO_LUSTRE_HINTS.
> >>
> >> However, on Sandia's redsky (more-or-less identical to RedMesa), I just
> >> tried these hints and I am also getting those same error messages you
> >> are seeing. So please let me know if you get this resolved.
> >
> > I can't think of any other code paths that use locking, unless your
> > system for some reason presents itself as nfs.
> >
> > That's why rajeev suggested prefixing with lustre: Unfortunately, that
> > won't help: it has only been since March of this year that (with
> > community support)  the Lustre driver in MPICH2 passed all the ROMIO
> > tests, and now we need to get that into OpenMPI.
> >
> > ==rob
> >
> > --
> > Rob Latham
> > Mathematics and Computer Science Division
> > Argonne National Lab, IL USA
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20100924/18ccf51e/attachment.htm>


More information about the parallel-netcdf mailing list