[MPICH] slow IOR when using fileview
Weikuan Yu
wyu at ornl.gov
Wed Jul 11 13:26:30 CDT 2007
Hi,
As I commented on Wei-keng's fix in another context of our discussion, I
think this issue of avoiding read-modify-write (RMW) can be taken care of in
a slight different way. Attached is the fix I cooked along the direction I
suggested earlier. Basically, this is to expose two additional API from adioi.h
1. void ADIOI_Filetype_range_iscontig()
2. void ADIOI_Filetype_range_start()
The first one is an API testing the contiguity of target file range for a
data input with a _count_ number of datatypes.
The second one is an API finding out the relevant parameters for the
starting parameters in the file that is targeted by the beginning of a data
input. Here, I agree with Wei-keng's recommendation of simplifying the while
loop in determining such starting parameters. It can be incorporated easily
into this API if so desired.
There are a number of benefits with these additional calls.
-1- So by calling API #1, for IO with simple file view composed of
contiguous data from each proc, ADIOI_GEN_{Write,Read}StridedColl will no
long trigger ADIO_{Write,Read}Strided(),
-2- That means no more need to chunk data into 512KB pieces and associated
processing overhead.
-3- Over Cray XT, this also means a much reduced number of fcntl calls for
locking during RMW for data sieving. No need for disabling data sieving, or
the need of increasing ds buffer sizes over XT.
-4- API #2 can be used to replace about 15 blocks of identical code in files
such as ad_{write,read}_str.c and others, therefore leading to reduce code
maintenance efforts and modularization. For this discussion, the cleanup is
not included in the patch yet. But it can be quickly done if these API is to
be taken.
BTW, this is also a fix I am suggesting to Cray for their incorporation.
Please consider for upstream integration.
Thanks,
Weikuan
Yu, Weikuan wrote:
> The concept of buffertype is implicitly linked with a concrete memory
> buffer, it is valid to report its contiguity. However, the filetype is
> more abstract a feature describing a process's view of a file and its
> own segments, so its contiguity needs to be reflected more accurately
> with associated process and the intended file range. In addtion, the
> buffertype describes about the data source, while the filetype describes
> the data sink. So they really do not intersect.
>
> However, I think your idea points to the correct direction. Something
> like the following is what I have in mind for a process to test the
> contiguity of a file within a range:
> ADIOI_Filetype_iscontig(filetype, offset, len, &filetype_is_contig);
>
> This may avoid sharing the contiguity checking routine between datatype
> and filetype.
> ADIOI_Datatype_iscontig(filetype, &filetype_is_contig);
>
> Comments?
> --Weikuan
>
>> In fact, this I/O pattern should trigger ADIO_WriteContig()
>> for best result. I suggest one more test should be given here
>> for checking if the intersection of the buffertype and
>> filetype is contiguous. If yes,
>> ADIO_WriteContig() is called. Here, the intersection
>> operation will involve the current file position. I don't
>> know how complicate can this implementation be.
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
>> Sent: Tuesday, July 03, 2007 1:04 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: RE: [MPICH] slow IOR when using fileview
>>
>>
>> I checked the ROMIO source for this particular access pattern.
>> At first, a few words about the access pattern.
>> 1) MPI_Type_create_subarray() creates the file access regions like
>> file: |----------|----------|----------| .... |----------|
>> P0 P1 P2 P7
>> Each segment is of size 10MB.
>> 2) There is no overlapped, interleaved, or non-contiguous
>> access across
>> all processes. Every file access is a single contiguous
>> write request.
>> 3) Write buffer is also contiguous. The write amount is 10
>> MB, same across
>> all MPI processes.
>> 4) The effect of using this file type should be the same as using
>> explicit file offset without file type.
>>
>> In ROMIO source file ad_write_coll.c, in function
>> ADIOI_GEN_WriteStridedColl(), ADIOI_Datatype_iscontig() is
>> called in line
>> 141 to check if the file type is contiguous and it returns 0.
>> That means the file type is not contiguous. In general, this
>> is true, since the file type is applied to the entire file
>> space repeatedly. Therefore, in line 153, ADIO_WriteStrided()
>> is called, instead of ADIO_WriteContig() in line 150. So,
>> data sieving is performed by default in ADIO_WriteStrided()
>> which chops the 10 MB write into 20 512KB chunks. For each
>> chunk, a read-modify-write is carried out.
>>
>> In fact, this I/O pattern should trigger ADIO_WriteContig()
>> for best result. I suggest one more test should be given here
>> for checking if the intersection of the buffertype and
>> filetype is contiguous. If yes,
>> ADIO_WriteContig() is called. Here, the intersection
>> operation will involve the current file position. I don't
>> know how complicate can this implementation be.
>>
>> Wei-keng
>>
>>
>>
>>
>> On Mon, 2 Jul 2007, Yu, Weikuan wrote:
>>
>>>> If the independent
>>>> access is used instead, I don't know why each write is
>> divided into
>>>> 512 KB chunks and locking is ever needed to guaranteed the atomic
>>>> access of the 10 MB contiguous file range. For this
>> particular access
>>>> pattern, ROMIO should not do read-modify-write at all.
>>> 512KB is the default buffer size for data sieving. So with
>> 512KB buffer size, each process is only able to write out
>> 512KB data in each call of ADIOI_GEN_WriteStrided. For 10MB,
>> this results in 20 iterations of write_all(), 40 fcntl()
>> total. crayPat indicates that fcntl() takes 88% of the total
>> Wall clock time with fileview, 0% w/o fileview.
>>> --Weikuan
>>>
>>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: adio-fcntl-fix-and-filerange-cleanup-02.patch
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070711/c3ccfd08/attachment.diff>
More information about the mpich-discuss
mailing list