[MPICH] code for checking interveaving

Wei-keng Liao wkliao at ece.northwestern.edu
Tue Jan 29 11:48:05 CST 2008


Rajeev is right. The monotonic requirement is within a single process 
only.

The current method will miss some non-interleaved cases by treating them 
as interleaved. I only wonder if something like Jianwei's I/O pattern may 
break it again.

Wei-keng

On Tue, 29 Jan 2008, Rajeev Thakur wrote:

> Offsets from a given "process" must be monotonically nondecreasing. "i" here
> refers to process rank.
> 
> Rajeev
> 
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov 
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Rob Ross
> > Sent: Tuesday, January 29, 2008 8:44 AM
> > To: Wei-keng Liao
> > Cc: mpich-discuss at mcs.anl.gov
> > Subject: Re: [MPICH] code for checking interveaving
> > 
> > Recall that the offsets from a given system must be monotonically  
> > increasing; I think this simplifies things quite a bit? -- Rob
> > 
> > On Jan 29, 2008, at 12:05 AM, Wei-keng Liao wrote:
> > 
> > >
> > > When I read that part, I was thinking about a case where
> > >  st_offsets[i] <= end_offsets[i] < st_offsets[i-1] <= 
> > end_offsets[i-1]
> > > This should not be considered interleaved.
> > >
> > > I also found Jianwei's fix does not solve the case when zero length
> > > occurs at i == 0, but not i == 1, i.e.
> > >    end_offsets[0] == st_offsets[0] - 1 and
> > >    st_offsets[1] <  end_offset[0] and
> > >    st_offsets[1] <= end_offset[1] (i == 1 is not zero-length)
> > > This case should not be considered interleaved either.
> > >
> > > How about changing the codes to
> > >        j = 0; /* find the first one with non-zero-length range */
> > >        while (end_offsets[j] < st_offsets[j] && j < nprocs) j++;
> > >
> > >        for (i=j+1; i<nprocs; i++) {
> > >            /* skip the ones with zero-length range */
> > >            if (end_offsets[i] < st_offsets[i]) continue;
> > >
> > >            if (st_offsets[i] < end_offsets[j])
> > >                interleave_count++; /* and break; ? */
> > >            j = i;
> > >        }
> > >
> > >
> > > The above is still not a complete interleave check. The precise  
> > > solution
> > > should involves sorting the st_offsets[], end_offsets[] pairs. The
> > > possible codes are given below if you would like to use it.
> > >
> > > ----< codes go to beginning of the file  
> > > >-------------------------------
> > > typedef struct {
> > >    ADIO_Offset start;
> > >    ADIO_Offset end;
> > > } start_end_pair;
> > >
> > > static int compare(const void *a, const void *b)
> > > {
> > >     ADIO_Offset a_start = ((start_end_pair*)a)->start;
> > >     ADIO_Offset b_start = ((start_end_pair*)b)->start;
> > >     if (a_start < b_start) return -1;
> > >     if (a_start > b_start) return  1;
> > >     return 0;
> > > }
> > >
> > >
> > > ----< codes to replace the interleave check >-----------------------
> > >        int j;
> > >        start_end_pair *st_end_list;
> > >
> > >        st_end_list = (start_end_pair*) ADIOI_Malloc(nprocs *  
> > > sizeof(start_end_pair));
> > >        j = 0;
> > >        for (i=0; i<nprocs; i++) {
> > >            if (end_offsets[i] < st_offsets[i]) continue;
> > >            st_end_list[j].start =  st_offsets[i];
> > >            st_end_list[j++].end = end_offsets[i];
> > >        }
> > >        qsort(st_end_list, j, sizeof(start_end_pair), compare);
> > >        for (i=1; i<j; i++)
> > >            if (st_end_list[i].start <= st_end_list[i-1].end)
> > >                interleave_count++; /* and break; ? */
> > >
> > >        ADIOI_Free(st_end_list);
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, 28 Jan 2008, Rajeev Thakur wrote:
> > >
> > >> That line was added in response to a bug report and fix from  
> > >> Jianwei Li. See
> > >> attached mail. Note that in the case he mentions (count=0),  
> > >> end_offset[i]
> > >> will be set to start_offset[i]-1.
> > >>
> > >> Rajeev
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: owner-mpich-discuss at mcs.anl.gov
> > >>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of 
> > Wei-keng Liao
> > >>> Sent: Monday, January 28, 2008 4:21 PM
> > >>> To: mpich-discuss at mcs.anl.gov
> > >>> Subject: [MPICH] code for checking interveaving
> > >>>
> > >>>
> > >>> In MPICH2-1.0.6p1, file adio/common/ad_write_coll.c, lines 112 -  
> > >>> 118,
> > >>>
> > >>> 112     /* are the accesses of different processes interleaved? */
> > >>> 113     for (i=1; i<nprocs; i++)
> > >>> 114         if ((st_offsets[i] < end_offsets[i-1]) &&
> > >>> 115             (st_offsets[i] <= end_offsets[i]))
> > >>> 116             interleave_count++;
> > >>> 117     /* This is a rudimentary check for interleaving, but
> > >>> should suffice
> > >>> 118        for the moment. */
> > >>>
> > >>>
> > >>> Shouldn't line 115 be the following?
> > >>>
> > >>> 115             (st_offsets[i-1] <= end_offsets[i]))
> > >>>                           ^^^^^
> > >>> Line 115 in its original form makes no sense.
> > >>> This not a bug, collective write shall still run correctly
> > >>> without change.
> > >>> But, in some case non-inverleaving will considered as 
> > interleaving.
> > >>>
> > >>> The same thing happens in ad_read_coll.c .
> > >>>
> > >>> Wei-keng
> > >>>
> > >>>
> > >>
> > >
> > 
> > 
> 




More information about the mpich-discuss mailing list