[mpich-discuss] filetype_is_contig in ad_write_coll.c

Rob Latham robl at mcs.anl.gov
Fri Oct 29 16:19:38 CDT 2010


Hi Wei-keng.  I'm reaching way back into the archives to answer your
questions.  Sorry for the ridiculously long latency of my reply: see
below...

On Mon, Dec 21, 2009 at 12:39:31AM -0600, Wei-keng Liao wrote:
> In file mpich2-1.2.1/src/mpi/romio/adio/common/ad_write_coll.c,
> function ADIOI_GEN_WriteStridedColl(), line 145, the call to
> ADIOI_Datatype_iscontig() may not be necessary if cb_write is
> not disabled, i.e. fd->hints->cb_write != ADIOI_HINT_DISABLE

I agree that sometimes ADIOI_Datatype_iscontig() is a little
conservative.  I think it reports "contig" only if the type is a named
type.  Maybe the fix is to tune ADIOI_Datatype_iscontig  ?

> Whether filetype is contiguous is actually already calculated when
> calling ADIOI_Calc_my_off_len() at line 160 above, which returns
> contig_access_count and if it is equal to 1, then filetype_is_contig
> should be 1.
> 
> However, I found that in some cases, the function
> ADIOI_Datatype_iscontig()
> does not report correctly. I tried a case that uses
> MPI_Type_create_subarray() to create a contiguous filetype, but
> ADIOI_Datatype_iscontig() returns a false. After checking the
> flatten type,
> I found that the flatten type contains 2 or 3 non-contiguous
> segments, as
>    flat_file->count == 2 or 3
> 
> In case of flat_file->count == 2, either flat_file->blocklens[0] or
> flat_file->blocklens[1] is 0.
> 
> In case of flat_file->count == 3, both flat_file->blocklens[0] and
> flat_file->blocklens[2] are 0s.
> 
> In both cases, there is only one element in flat_file->blocklens[]
> which is not 0.

I think your optimization might be a tad aggressive here.  You've
constructed a file datatype with an LB of 0 and an UB of 400, so even
though there's just one 80-byte region to be written in *this* case,
what if 'buf_size' was bigger?   We'd tile the file type and end up
with a non-contiguous access pattern, right?

I know you are looking at this from the parallel-netcdf context, where
the library sets up a file view big enough for the subsequent write.
Maybe there's an optimization we can make for that case. 

We have the memory buffer, count, and datatype, and we can get the
file type from fd->filetype:  If the following conditions are true, we
can override the filetype_is_contig check:

- the size of fd->filetype (as reported by MPI_Type_size) is the same
  as count*bufftype_size
- the contig_access_count returned from ADIOI_Calc_my_off_len is 1

How does that sound?

> Attached is an example C program and a patch to ad_write_coll.c.
> Running both can show the disagreement in filetype contiguity.
> The patch also fix this problem.

thanks for the test case.  It is exceedingly helpful for me to use when I
(eventually) look at the problem.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the mpich-discuss mailing list