[mpich-discuss] MPI IO on lustre

Thu Mar 4 08:44:49 CST 2010

Hi

You've touched upon an issue with a fairly long history.  The short
answer is that the MPI-IO on Lustre situation is today in much
better shape than it has ever been.

For a long time there was no optimized MPI-IO driver for Lustre.
MPI-IO instead used the general purpose "unix file system" driver.
This works well enough, except when doing collective I/O: the Lustre
locking scheme requires a fairly sophisticated algorithm. 

We now have two good options for MPI-IO on Lustre:  Cray's MPT-3.2 or
newer has an MPI-IO library with the sophisticated "write to lustre"
collective I/O algorithm.  

MPICH2 also now has an optimized Lustre dirver, though it's taken us a
while for the community to work out the bugs.   I think we might be at
that point now, but am waiting to hear from more testers.

I fear the authors of the studies you have read have reached the exact
wrong conclusion.  It is not MPI-IO that has the defect.  Rather,
Lustre's design makes it difficult to achieve high performance
parallel I/O.  Fortunately, the task is merely difficult, not
impossible, and the MPI-IO community has stepped up to the challenge.

The next MPICH2 release will contain the most recent Lustre driver. I
would like very much to hear your experiences with your simulation and
the improved driver.   

Thanks
==rob

On Wed, Mar 03, 2010 at 12:28:51PM -0800, burlen wrote:
> Our simulation that is currently running on order of 1E4 processes,
> and fast approaching 1E5 processes. IO is a substantial bottleneck.
> 
> The book "Using MPI-2: Advanced Features of the Message-Passsing
> Interface" makes the case that collective IO is one of the best
> option for an HPC application. In contrast I have read more recent
> studies which show that MPI-IO performs poorly on Lustre fs which is
> deployed ubiquitously on the systems we use. Some study even
> advocate abandoning MPI-IO in place of single file direct access via
> posix api. Which is to say MPI-IO delivers no optimization at all.
> 
> I am very curious as to the current state of Lustre ADIO in mpich2 ,
> and its future direction. Obviously fine tuning of both Lustre
> parameters and MPI-IO hints for the specific situation are critical.
> If the fine tuning is done reasonably well, can we expect
> high-preformance from MPI collective IO on Lustre fs currently or in
> the near term?
> 
> Thanks
> Burlen
> 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA