[MPICH] slow IOR when using fileview
Wei-keng Liao
wkliao at ece.northwestern.edu
Tue Jul 3 00:03:49 CDT 2007
I checked the ROMIO source for this particular access pattern.
At first, a few words about the access pattern.
1) MPI_Type_create_subarray() creates the file access regions like
file: |----------|----------|----------| .... |----------|
P0 P1 P2 P7
Each segment is of size 10MB.
2) There is no overlapped, interleaved, or non-contiguous access across
all processes. Every file access is a single contiguous write request.
3) Write buffer is also contiguous. The write amount is 10 MB, same across
all MPI processes.
4) The effect of using this file type should be the same as using
explicit file offset without file type.
In ROMIO source file ad_write_coll.c, in function
ADIOI_GEN_WriteStridedColl(), ADIOI_Datatype_iscontig() is called in line
141 to check if the file type is contiguous and it returns 0. That means
the file type is not contiguous. In general, this is true, since the file
type is applied to the entire file space repeatedly. Therefore, in line
153, ADIO_WriteStrided() is called, instead of ADIO_WriteContig() in line
150. So, data sieving is performed by default in ADIO_WriteStrided() which
chops the 10 MB write into 20 512KB chunks. For each chunk, a
read-modify-write is carried out.
In fact, this I/O pattern should trigger ADIO_WriteContig() for best
result. I suggest one more test should be given here for checking if the
intersection of the buffertype and filetype is contiguous. If yes,
ADIO_WriteContig() is called. Here, the intersection operation will
involve the current file position. I don't know how complicate can this
implementation be.
Wei-keng
On Mon, 2 Jul 2007, Yu, Weikuan wrote:
>> If the independent
>> access is used instead, I don't know why each write is divided into 512 KB
>> chunks and locking is ever needed to guaranteed the atomic access of the
>> 10 MB contiguous file range. For this particular access pattern, ROMIO
>> should not do read-modify-write at all.
>
> 512KB is the default buffer size for data sieving. So with 512KB buffer size, each process is only able to write out 512KB data in each call of ADIOI_GEN_WriteStrided. For 10MB, this results in 20 iterations of write_all(), 40 fcntl() total. crayPat indicates that fcntl() takes 88% of the total Wall clock time with fileview, 0% w/o fileview.
>
> --Weikuan
>
More information about the mpich-discuss
mailing list