[mpich-discuss] ROMIO: 2 phase IO method and error handling

Rob Ross rross at mcs.anl.gov
Thu Sep 2 08:55:09 CDT 2010


Fstat is probably more expensive than message passing. 

-- Rob

On Sep 2, 2010, at 7:54 AM, Pascal Deveze <Pascal.Deveze at bull.net> wrote:

> I have another idea. A test (in each process) could be made before the "2 phase method" to
> see how much data can be read (count). The 2 phase method could continue with this new calculated count.
> 
> The size of the file can be obtained with a call to fstat(fd->fd_sys, &statbuf).
> statbuf.st_size then contains the size of the file in bytes.
> Now, it is possible (not easy for me, but possible), to calculate how much data can be read by each process
> according to its own datatype and its own offset.
> 
> The advantage of this method is that it avoids message passing, its disadvantage is the call to fstat(). It
> also avoid to deal with modifications in the 2 phase method.
> 
> Pascal
> 
> Wei-keng Liao a écrit :
>> 
>> I don't think there is an easy fix for this problem.
>> 
>> To correctly return the read size for each MPI collective I/O call,
>> the actual read size of an aggregator must be somehow reported to all
>> requesting process that access this aggregator's file domain.
>> Since each requesting processes can have noncontiguous file access
>> to this aggregator's file domain, the fix must find out how much of the
>> short read size overlaps each contiguous access and return the total
>> size of the overlaps as the true read size.
>> 
>> Furthermore, if a short read occurred, the contents of missing part of
>> read buffer should not be changed. Local memory copying must check that.
>> 
>> What makes this problem even more complicated is
>> 1. each process can request from multiple aggregators,
>> 2. short read can happen at all aggregators, and
>> 3. local process's read fileview allows overlapping.
>> 
>> Wei-keng
>> 
>> On Sep 1, 2010, at 12:40 PM, Rob Latham wrote:
>> 
>>   
>>> On Mon, Aug 23, 2010 at 04:35:36PM +0200, Pascal Deveze wrote:
>>>     
>>>> I discovered that I can read after the end of file !
>>>> 
>>>> After a look in the romio source code, I see that the "2 phase IO"
>>>> method for read:
>>>> 1) Does not return the right count value in the status
>>>>       
>>> ...
>>>     
>>>> Has anybody an idea on how to correct this ?
>>>>       
>>> well, i've got an idea but it's not great.
>>> 
>>> We could report how much data each process actually read, but that
>>> would return surprising results in this test: rank 0 reads 5 bytes but
>>> rank 1 reads none. 
>>> 
>>> So we have to communicate the fact that we had a short read.  I guess
>>> another allreduce in the collective I/O path won't be so bad, but I
>>> still need to think some more about how to react to the fact that one
>>> aggregator had a short read.
>>> 
>>> ==rob
>>> 
>>> -- 
>>> Rob Latham
>>> Mathematics and Computer Science Division
>>> Argonne National Lab, IL USA
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> 
>>>     
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> 
>> 
>>   
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100902/90625b49/attachment.htm>


More information about the mpich-discuss mailing list