[MPICH] question about MPI-IO Read_all

Thu Apr 26 00:06:43 CDT 2007

Hi,

There may be a little misunderstanding with the MPI fileview. Fileview 
defines the file regions that are visible (readable/writable) to a 
process. It has nothing to do with buffer datatype and count used in 
Read_all(). The arguments count and datatype in Read_all() are used to 
describe the memory layout of the argument buffer. This datatype provides 
a convenient way to describe noncontiguous memory regions of an I/O buffer 
that are written to (read from) a file.

To tell the difference, one can consider that, for a write case, MPI-IO 
library will "pack" the non-contiguous data in the write buffer to form a 
continuous byte stream and fill it contiguously in the visible file 
regions specified by the fileview. Reverse the flow for the read case.

As for the 2 GB limitation, first I would say it is not very often to see 
a single process write > 2 GB data from a file (similar with read). That 
would take > 2GB memory space to accommodate the buffer in a single 
compute node. Even if this is the case, one still can use a properly 
defined buffer datatype to avoid the 32bit limit of the integer "count" 
argument. If the non-contiguity nature of the I/O buffer is regular 
strided, one can define a datatype using MPI_Type_vector(), 
MPI_Type_hvector(), etc. For highly irregular non-contiguity, one can use 
MPI_Type_struct(). If you want, you can describe your array memory layout 
and we may come up a way to define a datatype for it.

The data amount to be read/written by an MPI process in a 
Read_all()/Write_all() is the "count" multiplies the size of the buffer 
datatype. Therefore, the read/write amount can still be larger than 2GB 
when integer "count" is less than 2^32. Note that the data amount one can 
read/write is determined by these two arguments, not the fileview.
Hope this helps.

Wei-keng

On Wed, 25 Apr 2007, Peter Diamessis wrote:

> Hi folks,
>
> If I may just comment that this is a very interesting topic.
> I ran into a similar situation when using MPI_WRITE_ALL &
> MPI_READ_ALL to output/read-in non-contiguous 3-D data
> in my CFD solver. The "global" size of the binary file was approximately
> 10Gb consisting of 20 3-D variables. I would encounter errors when trying
> to output all 20 fields in one file. I then broke the file down into 10 files 
> with
> 2 fields each, with approximate file-size equal to 1.2 Gb. Then everything
> worked  smoothly. I'm wondering if this a similar issue to what Russell has
> been pointing out ?
>
> Sincerely,
>
> Pete Diamessis
>
>
>
>
> ----- Original Message ----- From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> To: "'Russell L. Carter'" <rcarter at esturion.net>; <mpich-discuss at mcs.anl.gov>
> Sent: Wednesday, April 25, 2007 10:19 PM
> Subject: RE: [MPICH] question about MPI-IO Read_all
>
>
>> 2^31 is 2 Gbytes. If you are reading 2 GB per process with a single
>> Read_all, you are already doing quite well performance-wise. If you want to
>> read more than that you can create a derived datatype of say 10 contiguous
>> bytes and pass that as the datatype to Read_all. That would give you 20 GB.
>> You read even more by using 100 or 1000 instead of 10.
>> 
>> In practice, you might encounter some errors, because the MPI-IO
>> implementation internally may use some types that are 32-bit, not expecting
>> anyone to read larger than that with a single call. So try it once, and if
>> it doesn't work, read in 2GB chunks.
>> 
>> Rajeev
>> 
>> 
>>> -----Original Message-----
>>> From: owner-mpich-discuss at mcs.anl.gov 
>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Russell L. Carter
>>> Sent: Wednesday, April 25, 2007 6:33 PM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: [MPICH] question about MPI-IO Read_all
>>> 
>>> Hi,
>>> I have a question about the amount of data it is possible to read
>>> using MPI::Create_hindex with a fundamental type of MPI::BYTE, and 
>>> MPI::File::Read_all.
>>> 
>>> Following the discussion about irregularly distributed arras beginning
>>> on p. 78 of "Using MPI-2", I want to read my data by doing this:
>>> 
>>> double *buf = ...;
>>> int count, bufsize = ...;
>>> MPI::Offset offset = ...;
>>> MPI::File f = MPI::File::Open(...);
>>> MPI::Datatype filetype(MPI::BYTE);
>>> filetype.Create_hindexed(count, blocks, displacements);
>>> f.Set_view(offset, MPI::BYTE, filetype, "native", info_);
>>> f.Read_all(buf, bufsize, MPI::BYTE);
>>> 
>>> What I a curious about is the amount of data that can
>>> be read with Read_all.  Since bufsize is an int, then
>>> that would seem to imply that the maximum Read_all (per node)
>>> is 2^31.  Which in bytes, is not gigantic.
>>> 
>>> Is there some other technique I can use to increase the amount
>>> of data I can Read_all at one time?  I have different sized
>>> data interspersed, so I can't offset by a larger fundamental
>>> type.  My arrays are not contiguous in the fortran calling program,
>>> and are of int and 4 or 8 byte reals.  If I use a Create_struct
>>> to make a filetype that I use to Set_view, doesn't this have
>>> the same read size limitation?  Only now it is for all the
>>> arrays in the struct.  Hopefully I am missing something.
>>> 
>>> Thanks,
>>> Russell
>>> 
>>> 
>> 
>