[mpich2-dev] ROMIO: Interleaving test

Rob Latham robl at mcs.anl.gov
Wed Sep 1 11:25:31 CDT 2010


On Wed, Sep 01, 2010 at 03:37:30PM +0200, Pascal Deveze wrote:
> There is one test that I do not understand. This test is used
> in the collective read/write to detect if the data are interleaved:
> 
>       /* are the accesses of different processes interleaved? */
>        for (i=1; i<nprocs; i++)
>            if ((st_offsets[i] < end_offsets[i-1]) &&
>                (st_offsets[i] <= end_offsets[i]))
>                interleave_count++;
>        /* This is a rudimentary check for interleaving, but should suffice
>           for the moment. */
>    }
> 
> The second member of the if statement (st_offsets[i] <=
> end_offsets[i]) is always verified.
> I think this should be (st_offsets[i-1] <= end_offsets[i]).

That addition happened 6 years ago, but I can't find the original bug
report (it's in the old req system, if someone can find "MPICH2 req
#1174" that might tell us more).

        for (i=1; i<nprocs; i++)
-           if (st_offsets[i] < end_offsets[i-1]) interleave_count++;
+           if ((st_offsets[i] < end_offsets[i-1]) && 
+                (st_offsets[i] <= end_offsets[i]))
+                interleave_count++;
        /* This is a rudimentary check for interleaving, but should suffice
           for the moment. */


ah, here we go. Back in 2004 Jianwei Li found a bug when some
processes had zero elements.  

    "When counting the "interleave_count", segments with length == 0
    should not be counted in even if their starting offsets fall
    within previous segment range."

I'm not sure why the check is for "<=" instead of strictly "<",
though.  Wish I had a test case attached to this old bug report.  

Ok, now I do.  Attached, and I'll add this to the repository. 


> Do I miss something ?

Yes, but it's not hard to miss this subtle thing: the comment a few
lines earlier sheds some light on this matter:

       /* Note: end_offset points to the last byte-offset that will be accessed.
           e.g., if start_offset=0 and 100 bytes to be read, end_offset=99*/

So, in the test case I attached, if you run it with four procs your st_offsets array and end_offsets array look like this:

st_offsets[] = {0, 1,2,3}
end_offsets[] = {3, 0, 1, 2}

See, if i do a zero-byte write at offset 3, my start is 3 and my end
is actually 2.  So, st_offsets[i] is not always less than or equal to
end_offsets[i]. specifically, it won't be if the region was a request
for zero bytes.

> And as the interleave_count is always tested with 0, it should be
> possible to break the loop
> after the incrementation of interleave_count.

I suppose we could do something clever like "optimize harder" if the
interleave count is higher... well, we don't do that :>

> In my point of view, the test could be something like:
>       /* are the accesses of different processes interleaved? */
>        for (i=1; i<nprocs; i++)
>            if ((st_offsets[i] < end_offsets[i-1]) &&
>                (st_offsets[i-1] <= end_offsets[i])) {
>                          interleave_count=1;
>                          break;
>            }
>        /* This is a rudimentary check for interleaving, but should suffice
>           for the moment. */

If I could justify burning a million cpu hours it would be great to
profile ROMIO on a full rack of Intrepid.  I'm sure breaking early
from loops like this helps scalability a little bit when these arrays
are 160k elements long.

I think I will leave the st_offsets[i] <= end_offsets[i] as is, but
put in a better comment.  I will, though, break as soon as we find
something interleaved.

Thanks for the report, though.  I am extremely happy you are taking
such a close look at ROMIO.  

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: interleave.c
Type: text/x-csrc
Size: 881 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20100901/3a9b3156/attachment-0001.c>


More information about the mpich2-dev mailing list