<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Rob,<br>
<br>
Thanks for your clear explanation. I undestand now the second part of
the if statement<br>
(st_offsets[i] <= end_offsets[i]) that just tests if the length is
not null.<br>
<br>
The test is not strictly "<", but "<=" because equality means
length=1.<br>
<br>
I still have problems with following cases:<br>
1) After an element with len=0, the interleave is not detected<br>
2) With element of len=1, interleave is not detected.<br>
<br>
I do not know if this could really be a problem. Here are just some
explanations<br>
on how to see the problem. I also give a possible solution.<br>
<br>
I modify your program to test these 2 cases and add a trace in
common/ad_write_coll.c:if (!myrank)<br>
for (i=0; i<nprocs; i++) {<br>
if (!myrank) printf("st_offsets[%d]=%d
end_offsets[%d]=%d\n", i, st_offsets[i], i, end_offsets[i]);<br>
}<br>
for (i=1; i<nprocs; i++)<br>
if ((st_offsets[i] < end_offsets[i-1]) &&<br>
(st_offsets[i] <= end_offsets[i]))<br>
interleave_count++;<br>
/* This is a rudimentary check for interleaving, but should
suffice<br>
for the moment. */<br>
if (!myrank) printf("%d: interleave_count=%d\n", interleave_count);<br>
<br>
For case 1:<br>
st_offsets[] = {0, 1,2,3}<br>
end_offsets[] = {1023, 0, 5, 2}<br>
<br>
For case 2:<br>
st_offsets[] = {1, 1,1,1}<br>
end_offsets[] = {1, 1, 1, 1}<br>
<br>
================ The test to see the problem ==========<br>
#include "mpi.h"<br>
#include <stdio.h><br>
<br>
#define LEN 1024<br>
<br>
int main(int argc, char **argv)<br>
{<br>
<br>
MPI_File fh;<br>
MPI_Status status;<br>
MPI_Offset offset;<br>
int length, nprocs, rank, i;<br>
char buffer[LEN];<br>
<br>
for(i=0; i<LEN; i++) buffer[i] = i;<br>
<br>
MPI_Init(&argc, &argv);<br>
<br>
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);<br>
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br>
<br>
MPI_File_open(MPI_COMM_WORLD, argv[1],
MPI_MODE_CREATE|MPI_MODE_RDWR, <br>
MPI_INFO_NULL, &fh);<br>
<br>
// Interleaved data is not detected after an element with null length<br>
/*<br>
* e.g. P0 ( off_0 = 0, len_0 = LEN )<br>
* P1 ( off_1 = 1, len_1 = 0 ) ===> null length<br>
* P2 ( off_2 = 2, len_2 = 4 ) ===> interleaved
not detected<br>
* P3 ( off_3 = 3, len_3 = 0 )<br>
* ....... ........<br>
* */<br>
if ((rank % 2) ==0) length=4;<br>
else length=0;<br>
if (rank == 0) length=LEN;<br>
<br>
offset = rank;<br>
<br>
MPI_File_write_at_all(fh, offset, buffer, length, MPI_BYTE,
&status);<br>
<br>
// Interleaved data is not detected if only one byte is common (len = 1)<br>
/*<br>
* e.g. P0 ( off_0 = 1, len_0 = 1 )<br>
* P1 ( off_1 = 1, len_1 = 1 ) ===> interleave not
detected<br>
* P2 ( off_2 = 1, len_2 = 1 ) ===> interleaved
not detected<br>
* P3 ( off_3 = 1, len_3 = 1 )<br>
* ....... ........<br>
* */<br>
length=1;<br>
offset=1;<br>
<br>
MPI_File_seek(fh, 0, MPI_SEEK_SET);<br>
MPI_File_write_at_all(fh, offset, buffer, length, MPI_BYTE,
&status);<br>
MPI_File_close(&fh);<br>
MPI_Finalize();<br>
<br>
return 0;<br>
}<br>
================ The results =====================<br>
st_offsets[0]=0 end_offsets[0]=1023<br>
st_offsets[1]=1 end_offsets[1]=0<br>
st_offsets[2]=2 end_offsets[2]=5<br>
st_offsets[3]=3 end_offsets[3]=2<br>
0: interleave_count=0<br>
st_offsets[0]=1 end_offsets[0]=1<br>
st_offsets[1]=1 end_offsets[1]=1<br>
st_offsets[2]=1 end_offsets[2]=1<br>
st_offsets[3]=1 end_offsets[3]=1<br>
0: interleave_count=0<br>
<br>
The interleaves are not detected.<br>
To detect case 1, the end offset of the last element with non null
length should be<br>
stored and tested with the start_offset of element i.<br>
To detect case 2, the first comparison of the if statement should be
"<=", not "<".<br>
<br>
This could be done in the following loop:<br>
<br>
ADIO_Offset last_end=end_offsets[0];<br>
<br>
for (i=1; i<nprocs; i++) {<br>
if (st_offsets[i] <= last_end) { // Possible interleave
for offset i<br>
if (st_offsets[i] <= end_offsets[i]) // length is
not null, so there is an interleave<br>
interleave_count++;<br>
}<br>
if (st_offsets[i] <= end_offsets[i]) // length is not
null, so change last_end<br>
last_end=end_offsets[i];<br>
/* This is a rudimentary check for interleaving, but should
suffice<br>
for the moment. */<br>
}<br>
<br>
Pascal<br>
<br>
Rob Latham a écrit :
<blockquote cite="mid:20100901162531.GI23171@mcs.anl.gov" type="cite">
<pre wrap="">On Wed, Sep 01, 2010 at 03:37:30PM +0200, Pascal Deveze wrote:
</pre>
<blockquote type="cite">
<pre wrap="">There is one test that I do not understand. This test is used
in the collective read/write to detect if the data are interleaved:
/* are the accesses of different processes interleaved? */
for (i=1; i<nprocs; i++)
if ((st_offsets[i] < end_offsets[i-1]) &&
(st_offsets[i] <= end_offsets[i]))
interleave_count++;
/* This is a rudimentary check for interleaving, but should suffice
for the moment. */
}
The second member of the if statement (st_offsets[i] <=
end_offsets[i]) is always verified.
I think this should be (st_offsets[i-1] <= end_offsets[i]).
</pre>
</blockquote>
<pre wrap=""><!---->
That addition happened 6 years ago, but I can't find the original bug
report (it's in the old req system, if someone can find "MPICH2 req
#1174" that might tell us more).
for (i=1; i<nprocs; i++)
- if (st_offsets[i] < end_offsets[i-1]) interleave_count++;
+ if ((st_offsets[i] < end_offsets[i-1]) &&
+ (st_offsets[i] <= end_offsets[i]))
+ interleave_count++;
/* This is a rudimentary check for interleaving, but should suffice
for the moment. */
ah, here we go. Back in 2004 Jianwei Li found a bug when some
processes had zero elements.
"When counting the "interleave_count", segments with length == 0
should not be counted in even if their starting offsets fall
within previous segment range."
I'm not sure why the check is for "<=" instead of strictly "<",
though. Wish I had a test case attached to this old bug report.
Ok, now I do. Attached, and I'll add this to the repository.
</pre>
<blockquote type="cite">
<pre wrap="">Do I miss something ?
</pre>
</blockquote>
<pre wrap=""><!---->
Yes, but it's not hard to miss this subtle thing: the comment a few
lines earlier sheds some light on this matter:
/* Note: end_offset points to the last byte-offset that will be accessed.
e.g., if start_offset=0 and 100 bytes to be read, end_offset=99*/
So, in the test case I attached, if you run it with four procs your st_offsets array and end_offsets array look like this:
st_offsets[] = {0, 1,2,3}
end_offsets[] = {3, 0, 1, 2}
See, if i do a zero-byte write at offset 3, my start is 3 and my end
is actually 2. So, st_offsets[i] is not always less than or equal to
end_offsets[i]. specifically, it won't be if the region was a request
for zero bytes.
</pre>
<blockquote type="cite">
<pre wrap="">And as the interleave_count is always tested with 0, it should be
possible to break the loop
after the incrementation of interleave_count.
</pre>
</blockquote>
<pre wrap=""><!---->
I suppose we could do something clever like "optimize harder" if the
interleave count is higher... well, we don't do that :>
</pre>
<blockquote type="cite">
<pre wrap="">In my point of view, the test could be something like:
/* are the accesses of different processes interleaved? */
for (i=1; i<nprocs; i++)
if ((st_offsets[i] < end_offsets[i-1]) &&
(st_offsets[i-1] <= end_offsets[i])) {
interleave_count=1;
break;
}
/* This is a rudimentary check for interleaving, but should suffice
for the moment. */
</pre>
</blockquote>
<pre wrap=""><!---->
If I could justify burning a million cpu hours it would be great to
profile ROMIO on a full rack of Intrepid. I'm sure breaking early
from loops like this helps scalability a little bit when these arrays
are 160k elements long.
I think I will leave the st_offsets[i] <= end_offsets[i] as is, but
put in a better comment. I will, though, break as soon as we find
something interleaved.
Thanks for the report, though. I am extremely happy you are taking
such a close look at ROMIO.
==rob
</pre>
</blockquote>
<br>
</body>
</html>