[mpich-discuss] MPI IO on lustre
Rob Latham
robl at mcs.anl.gov
Thu Mar 4 08:44:49 CST 2010
Hi
You've touched upon an issue with a fairly long history. The short
answer is that the MPI-IO on Lustre situation is today in much
better shape than it has ever been.
For a long time there was no optimized MPI-IO driver for Lustre.
MPI-IO instead used the general purpose "unix file system" driver.
This works well enough, except when doing collective I/O: the Lustre
locking scheme requires a fairly sophisticated algorithm.
We now have two good options for MPI-IO on Lustre: Cray's MPT-3.2 or
newer has an MPI-IO library with the sophisticated "write to lustre"
collective I/O algorithm.
MPICH2 also now has an optimized Lustre dirver, though it's taken us a
while for the community to work out the bugs. I think we might be at
that point now, but am waiting to hear from more testers.
I fear the authors of the studies you have read have reached the exact
wrong conclusion. It is not MPI-IO that has the defect. Rather,
Lustre's design makes it difficult to achieve high performance
parallel I/O. Fortunately, the task is merely difficult, not
impossible, and the MPI-IO community has stepped up to the challenge.
The next MPICH2 release will contain the most recent Lustre driver. I
would like very much to hear your experiences with your simulation and
the improved driver.
Thanks
==rob
On Wed, Mar 03, 2010 at 12:28:51PM -0800, burlen wrote:
> Our simulation that is currently running on order of 1E4 processes,
> and fast approaching 1E5 processes. IO is a substantial bottleneck.
>
> The book "Using MPI-2: Advanced Features of the Message-Passsing
> Interface" makes the case that collective IO is one of the best
> option for an HPC application. In contrast I have read more recent
> studies which show that MPI-IO performs poorly on Lustre fs which is
> deployed ubiquitously on the systems we use. Some study even
> advocate abandoning MPI-IO in place of single file direct access via
> posix api. Which is to say MPI-IO delivers no optimization at all.
>
> I am very curious as to the current state of Lustre ADIO in mpich2 ,
> and its future direction. Obviously fine tuning of both Lustre
> parameters and MPI-IO hints for the specific situation are critical.
> If the fine tuning is done reasonably well, can we expect
> high-preformance from MPI collective IO on Lustre fs currently or in
> the near term?
>
> Thanks
> Burlen
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the mpich-discuss
mailing list