[MPICH] code for checking interveaving

Rajeev Thakur thakur at mcs.anl.gov
Tue Jan 29 10:03:56 CST 2008


Offsets from a given "process" must be monotonically nondecreasing. "i" here
refers to process rank.

Rajeev

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Rob Ross
> Sent: Tuesday, January 29, 2008 8:44 AM
> To: Wei-keng Liao
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] code for checking interveaving
> 
> Recall that the offsets from a given system must be monotonically  
> increasing; I think this simplifies things quite a bit? -- Rob
> 
> On Jan 29, 2008, at 12:05 AM, Wei-keng Liao wrote:
> 
> >
> > When I read that part, I was thinking about a case where
> >  st_offsets[i] <= end_offsets[i] < st_offsets[i-1] <= 
> end_offsets[i-1]
> > This should not be considered interleaved.
> >
> > I also found Jianwei's fix does not solve the case when zero length
> > occurs at i == 0, but not i == 1, i.e.
> >    end_offsets[0] == st_offsets[0] - 1 and
> >    st_offsets[1] <  end_offset[0] and
> >    st_offsets[1] <= end_offset[1] (i == 1 is not zero-length)
> > This case should not be considered interleaved either.
> >
> > How about changing the codes to
> >        j = 0; /* find the first one with non-zero-length range */
> >        while (end_offsets[j] < st_offsets[j] && j < nprocs) j++;
> >
> >        for (i=j+1; i<nprocs; i++) {
> >            /* skip the ones with zero-length range */
> >            if (end_offsets[i] < st_offsets[i]) continue;
> >
> >            if (st_offsets[i] < end_offsets[j])
> >                interleave_count++; /* and break; ? */
> >            j = i;
> >        }
> >
> >
> > The above is still not a complete interleave check. The precise  
> > solution
> > should involves sorting the st_offsets[], end_offsets[] pairs. The
> > possible codes are given below if you would like to use it.
> >
> > ----< codes go to beginning of the file  
> > >-------------------------------
> > typedef struct {
> >    ADIO_Offset start;
> >    ADIO_Offset end;
> > } start_end_pair;
> >
> > static int compare(const void *a, const void *b)
> > {
> >     ADIO_Offset a_start = ((start_end_pair*)a)->start;
> >     ADIO_Offset b_start = ((start_end_pair*)b)->start;
> >     if (a_start < b_start) return -1;
> >     if (a_start > b_start) return  1;
> >     return 0;
> > }
> >
> >
> > ----< codes to replace the interleave check >-----------------------
> >        int j;
> >        start_end_pair *st_end_list;
> >
> >        st_end_list = (start_end_pair*) ADIOI_Malloc(nprocs *  
> > sizeof(start_end_pair));
> >        j = 0;
> >        for (i=0; i<nprocs; i++) {
> >            if (end_offsets[i] < st_offsets[i]) continue;
> >            st_end_list[j].start =  st_offsets[i];
> >            st_end_list[j++].end = end_offsets[i];
> >        }
> >        qsort(st_end_list, j, sizeof(start_end_pair), compare);
> >        for (i=1; i<j; i++)
> >            if (st_end_list[i].start <= st_end_list[i-1].end)
> >                interleave_count++; /* and break; ? */
> >
> >        ADIOI_Free(st_end_list);
> >
> >
> >
> >
> >
> >
> > On Mon, 28 Jan 2008, Rajeev Thakur wrote:
> >
> >> That line was added in response to a bug report and fix from  
> >> Jianwei Li. See
> >> attached mail. Note that in the case he mentions (count=0),  
> >> end_offset[i]
> >> will be set to start_offset[i]-1.
> >>
> >> Rajeev
> >>
> >>
> >>> -----Original Message-----
> >>> From: owner-mpich-discuss at mcs.anl.gov
> >>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of 
> Wei-keng Liao
> >>> Sent: Monday, January 28, 2008 4:21 PM
> >>> To: mpich-discuss at mcs.anl.gov
> >>> Subject: [MPICH] code for checking interveaving
> >>>
> >>>
> >>> In MPICH2-1.0.6p1, file adio/common/ad_write_coll.c, lines 112 -  
> >>> 118,
> >>>
> >>> 112     /* are the accesses of different processes interleaved? */
> >>> 113     for (i=1; i<nprocs; i++)
> >>> 114         if ((st_offsets[i] < end_offsets[i-1]) &&
> >>> 115             (st_offsets[i] <= end_offsets[i]))
> >>> 116             interleave_count++;
> >>> 117     /* This is a rudimentary check for interleaving, but
> >>> should suffice
> >>> 118        for the moment. */
> >>>
> >>>
> >>> Shouldn't line 115 be the following?
> >>>
> >>> 115             (st_offsets[i-1] <= end_offsets[i]))
> >>>                           ^^^^^
> >>> Line 115 in its original form makes no sense.
> >>> This not a bug, collective write shall still run correctly
> >>> without change.
> >>> But, in some case non-inverleaving will considered as 
> interleaving.
> >>>
> >>> The same thing happens in ad_read_coll.c .
> >>>
> >>> Wei-keng
> >>>
> >>>
> >>
> >
> 
> 




More information about the mpich-discuss mailing list