[MPICH] slow IOR when using fileview

Tue Jul 10 18:26:29 CDT 2007

Here is a revised of the fix I proposed earlier, where I forgot to
consider the file displacement, fd->disp, defined in the file view.

     if (file_ptr_type == ADIO_INDIVIDUAL && buftype_is_contig &&
         bufsize + (offset - (fd->disp + flat_file->indices[st_index])) <=
         flat_file->blocklens[st_index]) {
         ADIO_WriteContig(fd, buf, bufsize, MPI_BYTE, ADIO_EXPLICIT_OFFSET,
                          offset, status, error_code);
         return;
     }

Wei-keng

On Tue, 3 Jul 2007, Wei-keng Liao wrote:

>
> I come up with a very simple solution for checking this contiguous buftype, 
> non-contiguous filetype, but the "intersection" is contiguous.
> In file ad_write_str.c, line 265, insert the followings:
>
>    if (file_ptr_type == ADIO_INDIVIDUAL && buftype_is_contig &&
>        bufsize + (offset - flat_file->indices[st_index]) <=
>        flat_file->blocklens[st_index]) {
>        ADIO_WriteContig(fd, buf, bufsize, MPI_BYTE, ADIO_EXPLICIT_OFFSET,
>                         offset, status, error_code);
>        return;
>    }
>
> The first if condition "file_ptr_type == ADIO_INDIVIDUAL" is because I am not 
> sure if this is applicable to shared file pointers.
>
> The second condition "buftype_is_contig" is to ensure buffer is contiguous.
>
> The third condition is to ensure the requesting data is within a single block 
> of flat_file, ie, the st_index block in flat_file.
>
> I tested it with my testing code and it ran OK. Please let me know if this 
> can cause any problem that I did not think of. I hope it can be incorporated 
> into ROMIO in the future release.
>
> Wei-keng
>
>
>
> On Tue, 3 Jul 2007, Wei-keng Liao wrote:
>
>> 
>> I checked the ROMIO source for this particular access pattern.
>> At first, a few words about the access pattern.
>> 1) MPI_Type_create_subarray() creates the file access regions like
>>   file: |----------|----------|----------| .... |----------|
>>             P0         P1          P2               P7
>>   Each segment is of size 10MB.
>> 2) There is no overlapped, interleaved, or non-contiguous access across
>>   all processes. Every file access is a single contiguous write request.
>> 3) Write buffer is also contiguous. The write amount is 10 MB, same across
>>   all MPI processes.
>> 4) The effect of using this file type should be the same as using
>>   explicit file offset without file type.
>> 
>> In ROMIO source file ad_write_coll.c, in function 
>> ADIOI_GEN_WriteStridedColl(), ADIOI_Datatype_iscontig() is called in line 
>> 141 to check if the file type is contiguous and it returns 0. That means 
>> the file type is not contiguous. In general, this is true, since the file 
>> type is applied to the entire file space repeatedly. Therefore, in line 
>> 153, ADIO_WriteStrided() is called, instead of ADIO_WriteContig() in line 
>> 150. So, data sieving is performed by default in ADIO_WriteStrided() which 
>> chops the 10 MB write into 20 512KB chunks. For each chunk, a 
>> read-modify-write is carried out.
>> 
>> In fact, this I/O pattern should trigger ADIO_WriteContig() for best 
>> result. I suggest one more test should be given here for checking if the 
>> intersection of the buffertype and filetype is contiguous. If yes,
>> ADIO_WriteContig() is called. Here, the intersection operation will involve 
>> the current file position. I don't know how complicate can this 
>> implementation be.
>> 
>> Wei-keng
>> 
>> 
>> 
>> 
>> On Mon, 2 Jul 2007, Yu, Weikuan wrote:
>> 
>>>> If the independent
>>>> access is used instead, I don't know why each write is divided into 512 
>>>> KB
>>>> chunks and locking is ever needed to guaranteed the atomic access of the
>>>> 10 MB contiguous file range. For this particular access pattern, ROMIO
>>>> should not do read-modify-write at all.
>>> 
>>> 512KB is the default buffer size for data sieving. So with 512KB buffer 
>>> size, each process is only able to write out 512KB data in each call of 
>>> ADIOI_GEN_WriteStrided. For 10MB, this results in 20 iterations of 
>>> write_all(), 40 fcntl() total. crayPat indicates that fcntl() takes 88% of 
>>> the total Wall clock time with fileview, 0% w/o fileview.
>>> 
>>> --Weikuan
>>> 
>> 
>