[mpich-discuss] committed lustre fixes to MPICH2

Pascal Deveze Pascal.Deveze at bull.net
Thu Feb 25 07:13:14 CST 2010


Rob,

I see that you remove the "#if 0" flag. It's quite an event: We now have 
an available ADIO Lustre driver thanks the work of all !

Since my last comments, I encountered very rarely a problem with the 
ADIOI_Assert() line 496 of ad_lustre_wrcoll.c:

ADIOI_Assert((((ADIO_Offset)(MPIR_Upint)write_buf)+req_off-off) == (ADIO_Offset)(MPIR_Upint)(write_buf+req_off-off));

For some values of the variables, the assertion fails, even if all is 
right. For example in the following program:

main(int argc, char **argv) {
      char *write_buf=(char*)0x7f49fed85010;
      long long req_off=0x7365040;
      long long off=0x6000000;
      printf("1st part of the assertion=%llx\n", (((long long)(unsigned 
int)write_buf)+req_off-off));
      printf("2nd part of the assertion=%llx\n",  (long long)(unsigned 
int)(write_buf+req_off-off));
}
The execution gives:
1st part of the assertion=1000ea050
2nd part of the assertion=ea050

At line 830 of the same file, there is another assertion in the macro 
ADIOI_BUF_COPY definition:
        ADIOI_Assert((((ADIO_Offset)(MPIR_Upint)buf) + user_buf_idx) == 
(ADIO_Offset)(MPIR_Upint)((MPIR_Upint)buf + user_buf_idx)); \
For the moment, I do not have any problem with it, but I feel that the 
same problem could come. For example in the following program:

main(int argc, char **argv) {

      char *buf=(char*)0x7f49fed85010;
      long long user_buf_idx=0x7365040;
      printf("1st part of the assertion=%llx\n", (((long long)(unsigned 
int)buf) + user_buf_idx));
      printf("2nd part of the assertion=%llx\n", (long long)(unsigned 
int)((unsigned int)buf + user_buf_idx));
}
The execution gives:
1st part of the assertion=1060ea050
2nd part of the assertion=60ea050

I do not see the protection brought by these two assertions. I propose 
to suppress them or at least somebody else knows how to correct them.

For the moment I do not have access to a Lustre file system to make 
other tests, but I will soon.

Pascal




Rob Latham a écrit :
> In December and January we received a lot of contributions from the
> community improving the Lustre ADIO driver.  I have finally had time
> to review and incorporate all the suggested fixes, and have committed
> them to our SVN repository (revision 6324 contains all changes).   
>
> I'd like to thank everyone for testing, fixing, and explaining and
> testing some more.
>
> I've put the new code through a compile test on 32 bit and 64 bit
> systems, and it doesn't have warnings.  However, I do not have access
> to a Lustre file system. 
>
> I've copied everybody and anybody who has ever corresponded with me
> about ROMIO and Lustre: presumably you still have access to Lustre
> file systems.  I would appreciate it very much if you could try out
> the SVN version of ROMIO, paying particular note of collective I/O
> with multiple processors to lustre files striped across various
> numbers of OSTs as well as noncontiguous I/O.
>
> If you are unable to test the SVN version, I've generated a patch
> relative to the just-released MPICH2-1.2.1p1:
>
> http://www.mcs.anl.gov/~robl/romio/new_lustre_vs_mpich2-1.2.1p1.diff
>
> Thanks again, and do not hesitate to contact me if you have any
> questions or comments.
>
> ==rob
>
>   



More information about the mpich-discuss mailing list