[mpich-discuss] committed lustre fixes to MPICH2
Pascal Deveze
Pascal.Deveze at bull.net
Thu Feb 25 07:13:14 CST 2010
Rob,
I see that you remove the "#if 0" flag. It's quite an event: We now have
an available ADIO Lustre driver thanks the work of all !
Since my last comments, I encountered very rarely a problem with the
ADIOI_Assert() line 496 of ad_lustre_wrcoll.c:
ADIOI_Assert((((ADIO_Offset)(MPIR_Upint)write_buf)+req_off-off) == (ADIO_Offset)(MPIR_Upint)(write_buf+req_off-off));
For some values of the variables, the assertion fails, even if all is
right. For example in the following program:
main(int argc, char **argv) {
char *write_buf=(char*)0x7f49fed85010;
long long req_off=0x7365040;
long long off=0x6000000;
printf("1st part of the assertion=%llx\n", (((long long)(unsigned
int)write_buf)+req_off-off));
printf("2nd part of the assertion=%llx\n", (long long)(unsigned
int)(write_buf+req_off-off));
}
The execution gives:
1st part of the assertion=1000ea050
2nd part of the assertion=ea050
At line 830 of the same file, there is another assertion in the macro
ADIOI_BUF_COPY definition:
ADIOI_Assert((((ADIO_Offset)(MPIR_Upint)buf) + user_buf_idx) ==
(ADIO_Offset)(MPIR_Upint)((MPIR_Upint)buf + user_buf_idx)); \
For the moment, I do not have any problem with it, but I feel that the
same problem could come. For example in the following program:
main(int argc, char **argv) {
char *buf=(char*)0x7f49fed85010;
long long user_buf_idx=0x7365040;
printf("1st part of the assertion=%llx\n", (((long long)(unsigned
int)buf) + user_buf_idx));
printf("2nd part of the assertion=%llx\n", (long long)(unsigned
int)((unsigned int)buf + user_buf_idx));
}
The execution gives:
1st part of the assertion=1060ea050
2nd part of the assertion=60ea050
I do not see the protection brought by these two assertions. I propose
to suppress them or at least somebody else knows how to correct them.
For the moment I do not have access to a Lustre file system to make
other tests, but I will soon.
Pascal
Rob Latham a écrit :
> In December and January we received a lot of contributions from the
> community improving the Lustre ADIO driver. I have finally had time
> to review and incorporate all the suggested fixes, and have committed
> them to our SVN repository (revision 6324 contains all changes).
>
> I'd like to thank everyone for testing, fixing, and explaining and
> testing some more.
>
> I've put the new code through a compile test on 32 bit and 64 bit
> systems, and it doesn't have warnings. However, I do not have access
> to a Lustre file system.
>
> I've copied everybody and anybody who has ever corresponded with me
> about ROMIO and Lustre: presumably you still have access to Lustre
> file systems. I would appreciate it very much if you could try out
> the SVN version of ROMIO, paying particular note of collective I/O
> with multiple processors to lustre files striped across various
> numbers of OSTs as well as noncontiguous I/O.
>
> If you are unable to test the SVN version, I've generated a patch
> relative to the just-released MPICH2-1.2.1p1:
>
> http://www.mcs.anl.gov/~robl/romio/new_lustre_vs_mpich2-1.2.1p1.diff
>
> Thanks again, and do not hesitate to contact me if you have any
> questions or comments.
>
> ==rob
>
>
More information about the mpich-discuss
mailing list