[MPICH] code for checking interveaving

Rob Ross rross at mcs.anl.gov
Tue Jan 29 12:24:28 CST 2008


right; i guess i was thinking that it might help avoid the need to  
qsort...could be a merge at the worst? -- rob

On Jan 29, 2008, at 11:48 AM, Wei-keng Liao wrote:

>
> Rajeev is right. The monotonic requirement is within a single process
> only.
>
> The current method will miss some non-interleaved cases by treating  
> them
> as interleaved. I only wonder if something like Jianwei's I/O  
> pattern may
> break it again.
>
> Wei-keng
>
> On Tue, 29 Jan 2008, Rajeev Thakur wrote:
>
>> Offsets from a given "process" must be monotonically nondecreasing.  
>> "i" here
>> refers to process rank.
>>
>> Rajeev
>>
>>> -----Original Message-----
>>> From: owner-mpich-discuss at mcs.anl.gov
>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Rob Ross
>>> Sent: Tuesday, January 29, 2008 8:44 AM
>>> To: Wei-keng Liao
>>> Cc: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [MPICH] code for checking interveaving
>>>
>>> Recall that the offsets from a given system must be monotonically
>>> increasing; I think this simplifies things quite a bit? -- Rob
>>>
>>> On Jan 29, 2008, at 12:05 AM, Wei-keng Liao wrote:
>>>
>>>>
>>>> When I read that part, I was thinking about a case where
>>>> st_offsets[i] <= end_offsets[i] < st_offsets[i-1] <=
>>> end_offsets[i-1]
>>>> This should not be considered interleaved.
>>>>
>>>> I also found Jianwei's fix does not solve the case when zero length
>>>> occurs at i == 0, but not i == 1, i.e.
>>>>   end_offsets[0] == st_offsets[0] - 1 and
>>>>   st_offsets[1] <  end_offset[0] and
>>>>   st_offsets[1] <= end_offset[1] (i == 1 is not zero-length)
>>>> This case should not be considered interleaved either.
>>>>
>>>> How about changing the codes to
>>>>       j = 0; /* find the first one with non-zero-length range */
>>>>       while (end_offsets[j] < st_offsets[j] && j < nprocs) j++;
>>>>
>>>>       for (i=j+1; i<nprocs; i++) {
>>>>           /* skip the ones with zero-length range */
>>>>           if (end_offsets[i] < st_offsets[i]) continue;
>>>>
>>>>           if (st_offsets[i] < end_offsets[j])
>>>>               interleave_count++; /* and break; ? */
>>>>           j = i;
>>>>       }
>>>>
>>>>
>>>> The above is still not a complete interleave check. The precise
>>>> solution
>>>> should involves sorting the st_offsets[], end_offsets[] pairs. The
>>>> possible codes are given below if you would like to use it.
>>>>
>>>> ----< codes go to beginning of the file
>>>>> -------------------------------
>>>> typedef struct {
>>>>   ADIO_Offset start;
>>>>   ADIO_Offset end;
>>>> } start_end_pair;
>>>>
>>>> static int compare(const void *a, const void *b)
>>>> {
>>>>    ADIO_Offset a_start = ((start_end_pair*)a)->start;
>>>>    ADIO_Offset b_start = ((start_end_pair*)b)->start;
>>>>    if (a_start < b_start) return -1;
>>>>    if (a_start > b_start) return  1;
>>>>    return 0;
>>>> }
>>>>
>>>>
>>>> ----< codes to replace the interleave check  
>>>> >-----------------------
>>>>       int j;
>>>>       start_end_pair *st_end_list;
>>>>
>>>>       st_end_list = (start_end_pair*) ADIOI_Malloc(nprocs *
>>>> sizeof(start_end_pair));
>>>>       j = 0;
>>>>       for (i=0; i<nprocs; i++) {
>>>>           if (end_offsets[i] < st_offsets[i]) continue;
>>>>           st_end_list[j].start =  st_offsets[i];
>>>>           st_end_list[j++].end = end_offsets[i];
>>>>       }
>>>>       qsort(st_end_list, j, sizeof(start_end_pair), compare);
>>>>       for (i=1; i<j; i++)
>>>>           if (st_end_list[i].start <= st_end_list[i-1].end)
>>>>               interleave_count++; /* and break; ? */
>>>>
>>>>       ADIOI_Free(st_end_list);
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, 28 Jan 2008, Rajeev Thakur wrote:
>>>>
>>>>> That line was added in response to a bug report and fix from
>>>>> Jianwei Li. See
>>>>> attached mail. Note that in the case he mentions (count=0),
>>>>> end_offset[i]
>>>>> will be set to start_offset[i]-1.
>>>>>
>>>>> Rajeev
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>>> Wei-keng Liao
>>>>>> Sent: Monday, January 28, 2008 4:21 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: [MPICH] code for checking interveaving
>>>>>>
>>>>>>
>>>>>> In MPICH2-1.0.6p1, file adio/common/ad_write_coll.c, lines 112 -
>>>>>> 118,
>>>>>>
>>>>>> 112     /* are the accesses of different processes interleaved?  
>>>>>> */
>>>>>> 113     for (i=1; i<nprocs; i++)
>>>>>> 114         if ((st_offsets[i] < end_offsets[i-1]) &&
>>>>>> 115             (st_offsets[i] <= end_offsets[i]))
>>>>>> 116             interleave_count++;
>>>>>> 117     /* This is a rudimentary check for interleaving, but
>>>>>> should suffice
>>>>>> 118        for the moment. */
>>>>>>
>>>>>>
>>>>>> Shouldn't line 115 be the following?
>>>>>>
>>>>>> 115             (st_offsets[i-1] <= end_offsets[i]))
>>>>>>                          ^^^^^
>>>>>> Line 115 in its original form makes no sense.
>>>>>> This not a bug, collective write shall still run correctly
>>>>>> without change.
>>>>>> But, in some case non-inverleaving will considered as
>>> interleaving.
>>>>>>
>>>>>> The same thing happens in ad_read_coll.c .
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>




More information about the mpich-discuss mailing list