[mpich-discuss] ROMIO individual file pointer

Rajeev Thakur thakur at mcs.anl.gov
Wed Jun 18 14:32:14 CDT 2008


But does it cause any problem? At that offset, if blocklens is 0 for that
process, nothing will be written. The next write will occur at the next
offset with non-zero blocklen. The fd->fp_ind value is internal to the
implementation. As long as the right I/O gets done, and the right value is
returned for MPI_File_get_posn, it is ok.

Rajeev  

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
> Sent: Tuesday, June 17, 2008 11:52 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] ROMIO individual file pointer
> 
> 
> In ROMIO romio/adio/common/ad_set_view.c, line 60 states the 
> individual 
> file point, fd->fp_ind points to the first byte to be 
> accessed. I can see 
> fd->fp_ind is set to the right value in this file.
> 
> However, in romio/adio/common/ad_read_coll.c, function 
> ADIOI_Calc_my_off_len() line 443, fd->fp_ind is set to the value of 
> variable "off". Now, the problem is how "off" is calculated. From the 
> codes between lines 428 and 438, "off" is moved up to the 
> next flat_file 
> segment, index j. Since flat_file may contains an 
> empty-length element 
> (either first or last) whose blocklens[] is equal to zero, when user 
> buffer size "bufsize" is filled, "off" will not moved to the 
> first byte to 
> be accessed in the next collective I/O.
> 
> This problem appears when I defined a non-contiguous file 
> view and did two 
> collective I/O consecutively, each requesting data size equal 
> to one whole 
> file view. Then at the beginning of the second collective 
> I/O, fd->fp_ind 
> of all processes are having the same value, pointing to the 
> beginning of 
> the second file view, instead of individual starting offset.
> 
> The attached codes demonstrate this problem. If a printf 
> statement for 
> fd->fp_ind is inserted at the beginning of 
> ADIOI_GEN_WriteStridedColl() in 
> file ad_write_coll.c, the standard outputs are 
>   0: First collective write -----------------------
>   0: fd->fpind = 0
>   1: fd->fpind = 5
>   2: fd->fpind = 50
>   3: fd->fpind = 55
>   0: Second collective write -----------------------
>   0: fd->fpind = 100
>   1: fd->fpind = 100     <-- not right
>   2: fd->fpind = 100     <-- not right
>   3: fd->fpind = 100     <-- not right
> 
> The correct results for the second collective write should be:
>   0: Second collective write -----------------------
>   0: fd->fpind = 100
>   1: fd->fpind = 105
>   2: fd->fpind = 150
>   3: fd->fpind = 155
> 
> 
> My fix to this probelm is to insert the following codes in between
> lines 435 and 436 of file ad_read_coll.c.
> 
>     while (flat_file->blocklens[j]==0) {
>         j++;
>         if (j == flat_file->count) {
>             j = 0;
>             n_filetypes++;
>         }
>     }
> 
> 
> Wei-keng
> 




More information about the mpich-discuss mailing list