[mpich-discuss] Reading a large file
Rob Latham
robl at mcs.anl.gov
Mon Oct 17 15:55:05 CDT 2011
On Sun, Oct 16, 2011 at 09:47:15AM -0500, Tabrez Ali wrote:
> What is the fastest way for 1000+ ranks/cores to read a single 1+ GB
> file with irregular data.
Without knowing more details, the fastest way is usually
MPI_Exscan (to share how much data each processor will contribute to
I/O)
followed by MPI_File_read_at_all (to get every processor, even those
with no work to do, participating in the request)
> Basically I have an unstructured FE code where all ranks need to
> read the mesh data they own from a single input file (based on the
> mesh partitioning info).
>
> Right now all ranks simultaneously open the file (using 'open' in
> Fortran), read in the values they own and skip the rest. For a
> problem with 16 million nodes (~1.5GB total file size) on 1024 cores
> of a linux cluster (with Lustre) this takes upto 2 mins (I/O part)
> before all ranks have (owned) node/element info.
If you created the file with fortran, then there might not be much we
can do to help you out. Fortran I/O differs significantly, and
non-portably, from C I/O. But let's assume you've altered the write
step of your simulation to also use MPI-IO.
MPI-IO collective I/O will probably help, especially with particle
data, in that several small I/O requests will get merged into a
smaller number of large requests. Depending on your workload, some
MPI-IO tuning hints may help further improve performance.
> Would MPI I/O routines perform better in my situation specially for
> larger problems?
Something like HDF5 or Parallel-NetCDF might be helpful: both those
libraries provide a somewhat higher-level approach to describing the
I/O, so you for example would not have to worry about byte offsets,
but rather elements of a N-dimensional array. They both use MPI-IO
under the covers, though, so you get all the performance benefits.
Just something to consider as you continue refining your i/o approach.
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the mpich-discuss
mailing list