[MPICH] code for checking interveaving
Wei-keng Liao
wkliao at ece.northwestern.edu
Tue Jan 29 00:05:34 CST 2008
When I read that part, I was thinking about a case where
st_offsets[i] <= end_offsets[i] < st_offsets[i-1] <= end_offsets[i-1]
This should not be considered interleaved.
I also found Jianwei's fix does not solve the case when zero length
occurs at i == 0, but not i == 1, i.e.
end_offsets[0] == st_offsets[0] - 1 and
st_offsets[1] < end_offset[0] and
st_offsets[1] <= end_offset[1] (i == 1 is not zero-length)
This case should not be considered interleaved either.
How about changing the codes to
j = 0; /* find the first one with non-zero-length range */
while (end_offsets[j] < st_offsets[j] && j < nprocs) j++;
for (i=j+1; i<nprocs; i++) {
/* skip the ones with zero-length range */
if (end_offsets[i] < st_offsets[i]) continue;
if (st_offsets[i] < end_offsets[j])
interleave_count++; /* and break; ? */
j = i;
}
The above is still not a complete interleave check. The precise solution
should involves sorting the st_offsets[], end_offsets[] pairs. The
possible codes are given below if you would like to use it.
----< codes go to beginning of the file >-------------------------------
typedef struct {
ADIO_Offset start;
ADIO_Offset end;
} start_end_pair;
static int compare(const void *a, const void *b)
{
ADIO_Offset a_start = ((start_end_pair*)a)->start;
ADIO_Offset b_start = ((start_end_pair*)b)->start;
if (a_start < b_start) return -1;
if (a_start > b_start) return 1;
return 0;
}
----< codes to replace the interleave check >-----------------------
int j;
start_end_pair *st_end_list;
st_end_list = (start_end_pair*) ADIOI_Malloc(nprocs * sizeof(start_end_pair));
j = 0;
for (i=0; i<nprocs; i++) {
if (end_offsets[i] < st_offsets[i]) continue;
st_end_list[j].start = st_offsets[i];
st_end_list[j++].end = end_offsets[i];
}
qsort(st_end_list, j, sizeof(start_end_pair), compare);
for (i=1; i<j; i++)
if (st_end_list[i].start <= st_end_list[i-1].end)
interleave_count++; /* and break; ? */
ADIOI_Free(st_end_list);
On Mon, 28 Jan 2008, Rajeev Thakur wrote:
> That line was added in response to a bug report and fix from Jianwei Li. See
> attached mail. Note that in the case he mentions (count=0), end_offset[i]
> will be set to start_offset[i]-1.
>
> Rajeev
>
>
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
> > Sent: Monday, January 28, 2008 4:21 PM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] code for checking interveaving
> >
> >
> > In MPICH2-1.0.6p1, file adio/common/ad_write_coll.c, lines 112 - 118,
> >
> > 112 /* are the accesses of different processes interleaved? */
> > 113 for (i=1; i<nprocs; i++)
> > 114 if ((st_offsets[i] < end_offsets[i-1]) &&
> > 115 (st_offsets[i] <= end_offsets[i]))
> > 116 interleave_count++;
> > 117 /* This is a rudimentary check for interleaving, but
> > should suffice
> > 118 for the moment. */
> >
> >
> > Shouldn't line 115 be the following?
> >
> > 115 (st_offsets[i-1] <= end_offsets[i]))
> > ^^^^^
> > Line 115 in its original form makes no sense.
> > This not a bug, collective write shall still run correctly
> > without change.
> > But, in some case non-inverleaving will considered as interleaving.
> >
> > The same thing happens in ad_read_coll.c .
> >
> > Wei-keng
> >
> >
>
More information about the mpich-discuss
mailing list