[MPICH] slow IOR when using fileview

Wei-keng Liao wkliao at ece.northwestern.edu
Tue Jul 3 00:03:49 CDT 2007


I checked the ROMIO source for this particular access pattern.
At first, a few words about the access pattern.
1) MPI_Type_create_subarray() creates the file access regions like
    file: |----------|----------|----------| .... |----------|
              P0         P1          P2               P7
    Each segment is of size 10MB.
2) There is no overlapped, interleaved, or non-contiguous access across
    all processes. Every file access is a single contiguous write request.
3) Write buffer is also contiguous. The write amount is 10 MB, same across
    all MPI processes.
4) The effect of using this file type should be the same as using
    explicit file offset without file type.

In ROMIO source file ad_write_coll.c, in function 
ADIOI_GEN_WriteStridedColl(), ADIOI_Datatype_iscontig() is called in line 
141 to check if the file type is contiguous and it returns 0. That means 
the file type is not contiguous. In general, this is true, since the file 
type is applied to the entire file space repeatedly. Therefore, in line 
153, ADIO_WriteStrided() is called, instead of ADIO_WriteContig() in line 
150. So, data sieving is performed by default in ADIO_WriteStrided() which 
chops the 10 MB write into 20 512KB chunks. For each chunk, a 
read-modify-write is carried out.

In fact, this I/O pattern should trigger ADIO_WriteContig() for best 
result. I suggest one more test should be given here for checking if the 
intersection of the buffertype and filetype is contiguous. If yes,
ADIO_WriteContig() is called. Here, the intersection operation will 
involve the current file position. I don't know how complicate can this 
implementation be.

Wei-keng




On Mon, 2 Jul 2007, Yu, Weikuan wrote:

>> If the independent
>> access is used instead, I don't know why each write is divided into 512 KB
>> chunks and locking is ever needed to guaranteed the atomic access of the
>> 10 MB contiguous file range. For this particular access pattern, ROMIO
>> should not do read-modify-write at all.
>
> 512KB is the default buffer size for data sieving. So with 512KB buffer size, each process is only able to write out 512KB data in each call of ADIOI_GEN_WriteStrided. For 10MB, this results in 20 iterations of write_all(), 40 fcntl() total. crayPat indicates that fcntl() takes 88% of the total Wall clock time with fileview, 0% w/o fileview.
>
> --Weikuan
>




More information about the mpich-discuss mailing list