[mpich-discuss] Reading a large file

Tue Oct 18 07:58:32 CDT 2011

Rob

So the reason it is slow right now is because each rank is reading only 
a line at a time from the same file and MPI I/O avoids that.

I will explore Parallel-NetCDF (as I have used NetCDF before).

Thanks again.

Tabrez

  On 10/17/2011 03:55 PM, Rob Latham wrote:
> On Sun, Oct 16, 2011 at 09:47:15AM -0500, Tabrez Ali wrote:
>> What is the fastest way for 1000+ ranks/cores to read a single 1+ GB
>> file with irregular data.
> Without knowing more details, the fastest way is usually
> MPI_Exscan (to share how much data each processor will contribute to
> I/O)
> followed by MPI_File_read_at_all (to get every processor, even those
> with no work to do, participating in the request)
>
>> Basically I have an unstructured FE code where all ranks need to
>> read the mesh data they own from a single input file (based on the
>> mesh partitioning info).
>>
>> Right now all ranks simultaneously open the file (using 'open'  in
>> Fortran), read in the values they own and skip the rest. For a
>> problem with 16 million nodes (~1.5GB total file size) on 1024 cores
>> of a linux cluster (with Lustre) this takes upto 2 mins (I/O part)
>> before all ranks have (owned) node/element info.
> If you created the file with fortran, then there might not be much we
> can do to help you out.  Fortran I/O differs significantly, and
> non-portably, from C I/O.    But let's assume you've altered the write
> step of your simulation to also use MPI-IO.
>
> MPI-IO collective I/O will probably help, especially with particle
> data, in that several small I/O requests will get merged into a
> smaller number of large requests.  Depending on your workload, some
> MPI-IO tuning hints may help further improve performance.
>
>> Would MPI I/O routines perform better in my situation specially for
>> larger problems?
> Something like HDF5 or Parallel-NetCDF might be helpful: both those
> libraries provide a somewhat higher-level approach to describing the
> I/O, so you for example would not have to worry about byte offsets,
> but rather elements of a N-dimensional array.  They both use MPI-IO
> under the covers, though, so you get all the performance benefits.
> Just something to consider as you continue refining your i/o approach.
>
>