Hi Wei,<br><br>Collective IO is the default for PIO.<br><br>Jim<br><br><div class="gmail_quote">On Fri, Sep 24, 2010 at 9:53 AM, Wei-keng Liao <span dir="ltr"><<a href="mailto:wkliao@ece.northwestern.edu">wkliao@ece.northwestern.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi, John,<br>
<br>
Rob is right, turning off data sieving just avoids the error messages and may<br>
significantly slow down the performance (if your I/O is non-contiguous and<br>
only calls non-collective.)<br>
<br>
On Lustre, you need collective I/O to get high I/O bandwidths. Even if your<br>
write request from each process is already large and contiguous, non-collective<br>
write will still give you poor results. This is not the case on other file<br>
systems, e.g. GPFS.<br>
<br>
Is there an option to turn on collective I/O in WRF and PIO?<br>
<br>
If non-collective I/O is the only option (due to the irregular data<br>
distribution), then non-blocking I/O is another solution. In<br>
pnetcdf 1.2.0, non-blocking I/O can aggregate multiple<br>
non-collective requests into a single collective one. However, this<br>
approach requires changes to the pnetcdf calls in the I/O library<br>
used by WRF and PIO. The changes should be very simple, though.<br>
In general, I would suggest Lustre users to seek all opportunity to call<br>
collective I/O.<br>
<font color="#888888"><br>
Wei-keng<br>
</font><div><div></div><div class="h5"><br>
On Sep 24, 2010, at 9:44 AM, Rob Latham wrote:<br>
<br>
> On Fri, Sep 24, 2010 at 07:49:06AM -0600, Mark Taylor wrote:<br>
>> Hi John,<br>
>><br>
>> I've had a very similar issue a while ago on several older Lustre<br>
>> filesystems at Sandia, and I can confirm that setting those hints did<br>
>> allow the code to run<br>
><br>
> if you turn off data sieving then there will be no more lock calls.<br>
> Depending on how your application partitions the arrays, that could be<br>
> fine, or it could result in a billion 8 byte operations.<br>
><br>
>> (but I could never get pnetcdf to be any faster<br>
>> than netcdf).<br>
><br>
> Unsurprising, honestly. If you are dealing with Lustre, then you must<br>
> both use an updated ROMIO and use collective I/O.<br>
><br>
> Here is the current list of MPI-IO implementations that work well with<br>
> Lustre:<br>
><br>
> - Cray MPT 3.2 or newer<br>
> - MPICH2-1.3.0a1 or newer<br>
> - and that's it.<br>
><br>
> I think the OpenMPI community is working on a re-sync with MPICH2<br>
> romio. I also think we can stitch together a patch against OpenMPI if<br>
> you really need the improved lustre driver. I'm not really in<br>
> patch-generating mode right now, but maybe once i'm back in the office<br>
> I can see how tricky it will be.<br>
><br>
>> This was with CAM, with pnetcdf being called by PIO, and<br>
>> PIO has a compiler option to turn this on, -DPIO_LUSTRE_HINTS.<br>
>><br>
>> However, on Sandia's redsky (more-or-less identical to RedMesa), I just<br>
>> tried these hints and I am also getting those same error messages you<br>
>> are seeing. So please let me know if you get this resolved.<br>
><br>
> I can't think of any other code paths that use locking, unless your<br>
> system for some reason presents itself as nfs.<br>
><br>
> That's why rajeev suggested prefixing with lustre: Unfortunately, that<br>
> won't help: it has only been since March of this year that (with<br>
> community support) the Lustre driver in MPICH2 passed all the ROMIO<br>
> tests, and now we need to get that into OpenMPI.<br>
><br>
> ==rob<br>
><br>
> --<br>
> Rob Latham<br>
> Mathematics and Computer Science Division<br>
> Argonne National Lab, IL USA<br>
><br>
<br>
</div></div></blockquote></div><br>