[MPICH] code for checking interveaving
Rob Ross
rross at mcs.anl.gov
Tue Jan 29 08:43:51 CST 2008
Recall that the offsets from a given system must be monotonically
increasing; I think this simplifies things quite a bit? -- Rob
On Jan 29, 2008, at 12:05 AM, Wei-keng Liao wrote:
>
> When I read that part, I was thinking about a case where
> st_offsets[i] <= end_offsets[i] < st_offsets[i-1] <= end_offsets[i-1]
> This should not be considered interleaved.
>
> I also found Jianwei's fix does not solve the case when zero length
> occurs at i == 0, but not i == 1, i.e.
> end_offsets[0] == st_offsets[0] - 1 and
> st_offsets[1] < end_offset[0] and
> st_offsets[1] <= end_offset[1] (i == 1 is not zero-length)
> This case should not be considered interleaved either.
>
> How about changing the codes to
> j = 0; /* find the first one with non-zero-length range */
> while (end_offsets[j] < st_offsets[j] && j < nprocs) j++;
>
> for (i=j+1; i<nprocs; i++) {
> /* skip the ones with zero-length range */
> if (end_offsets[i] < st_offsets[i]) continue;
>
> if (st_offsets[i] < end_offsets[j])
> interleave_count++; /* and break; ? */
> j = i;
> }
>
>
> The above is still not a complete interleave check. The precise
> solution
> should involves sorting the st_offsets[], end_offsets[] pairs. The
> possible codes are given below if you would like to use it.
>
> ----< codes go to beginning of the file
> >-------------------------------
> typedef struct {
> ADIO_Offset start;
> ADIO_Offset end;
> } start_end_pair;
>
> static int compare(const void *a, const void *b)
> {
> ADIO_Offset a_start = ((start_end_pair*)a)->start;
> ADIO_Offset b_start = ((start_end_pair*)b)->start;
> if (a_start < b_start) return -1;
> if (a_start > b_start) return 1;
> return 0;
> }
>
>
> ----< codes to replace the interleave check >-----------------------
> int j;
> start_end_pair *st_end_list;
>
> st_end_list = (start_end_pair*) ADIOI_Malloc(nprocs *
> sizeof(start_end_pair));
> j = 0;
> for (i=0; i<nprocs; i++) {
> if (end_offsets[i] < st_offsets[i]) continue;
> st_end_list[j].start = st_offsets[i];
> st_end_list[j++].end = end_offsets[i];
> }
> qsort(st_end_list, j, sizeof(start_end_pair), compare);
> for (i=1; i<j; i++)
> if (st_end_list[i].start <= st_end_list[i-1].end)
> interleave_count++; /* and break; ? */
>
> ADIOI_Free(st_end_list);
>
>
>
>
>
>
> On Mon, 28 Jan 2008, Rajeev Thakur wrote:
>
>> That line was added in response to a bug report and fix from
>> Jianwei Li. See
>> attached mail. Note that in the case he mentions (count=0),
>> end_offset[i]
>> will be set to start_offset[i]-1.
>>
>> Rajeev
>>
>>
>>> -----Original Message-----
>>> From: owner-mpich-discuss at mcs.anl.gov
>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
>>> Sent: Monday, January 28, 2008 4:21 PM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: [MPICH] code for checking interveaving
>>>
>>>
>>> In MPICH2-1.0.6p1, file adio/common/ad_write_coll.c, lines 112 -
>>> 118,
>>>
>>> 112 /* are the accesses of different processes interleaved? */
>>> 113 for (i=1; i<nprocs; i++)
>>> 114 if ((st_offsets[i] < end_offsets[i-1]) &&
>>> 115 (st_offsets[i] <= end_offsets[i]))
>>> 116 interleave_count++;
>>> 117 /* This is a rudimentary check for interleaving, but
>>> should suffice
>>> 118 for the moment. */
>>>
>>>
>>> Shouldn't line 115 be the following?
>>>
>>> 115 (st_offsets[i-1] <= end_offsets[i]))
>>> ^^^^^
>>> Line 115 in its original form makes no sense.
>>> This not a bug, collective write shall still run correctly
>>> without change.
>>> But, in some case non-inverleaving will considered as interleaving.
>>>
>>> The same thing happens in ad_read_coll.c .
>>>
>>> Wei-keng
>>>
>>>
>>
>
More information about the mpich-discuss
mailing list