[mpich-discuss] ROMIO: Patch to run on Lustre

Pascal Deveze Pascal.Deveze at bull.net
Tue Sep 21 08:57:47 CDT 2010


Hi Rob,

Rob Latham a écrit :
> On Fri, Sep 17, 2010 at 01:17:22PM +0200, Pascal Deveze wrote:
>
>   
>> After initializing lum->lmm_stripe_count to a "correct value", this
>> problem disappears.
>> I think this is a Lustre bug, but I propose to integer this patch:
>>     
>
> thanks, pascal!
>
>   
>> --- src/mpi/romio/adio/ad_lustre/ad_lustre_open.c       2010-09-17
>> 12:50:58.000000000 +0200
>> +++ src/mpi/romio/adio/ad_lustre/ad_lustre_open.c.OLD   2010-05-25
>> 20:59:13.000000000 +0200
>> @@ -59,9 +59,6 @@
>>                  MAX_LOV_UUID_COUNT * sizeof(struct lov_user_ost_data);
>>         lum = (struct lov_user_md *)ADIOI_Malloc(lumlen);
>>         lum->lmm_magic = LOV_USER_MAGIC;
>> -       /* Initialize lum->lmm_stripe_count with a value else
>> ioctl() returns an error */
>> -       /* This value must be greater or egal than the existing
>> lmm_stripe_count (bug in Lustre ?) */
>> -        lum->lmm_stripe_count = -1;
>>         err = ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void *)lum);
>>         if (!err) {
>>             value = (char *)
>> ADIOI_Malloc((MPI_MAX_INFO_VAL+1)*sizeof(char));
>>     
>
> What if instead of explicitly initializing elements of the struct
> lov_user_md, we called ADIOI_Calloc(1, lumlen) to set
> everything in the struct to zero?  then if that struct changes in
> lustre-2.0 or lustre-5.0 or whatever we'll still be covered..  Or,
> would zero also give that error about value too large?
>   

I did not test the value 0. In fact the value 0 is accepted.
So, you are right, we can call ADIOI_Calloc(1, lumlen).
I copy you the new patch hereafter (only one changed line):

--- src/mpi/romio/adio/ad_lustre/ad_lustre_open.c       2010-09-21 
15:50:07.000000000 +0200
+++ src/mpi/romio/adio/ad_lustre/ad_lustre_open.c.OLD   2010-05-25 
20:59:13.000000000 +0200
@@ -57,7 +57,7 @@
         * then a list of 'lmm_objects' representing stripe */
         lumlen = sizeof(struct lov_user_md) +
                  MAX_LOV_UUID_COUNT * sizeof(struct lov_user_ost_data);
-        lum = (struct lov_user_md *)ADIOI_Calloc(1, lumlen);
+        lum = (struct lov_user_md *)ADIOI_Malloc(lumlen);
         lum->lmm_magic = LOV_USER_MAGIC;
         err = ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void *)lum);
         if (!err) {


Pascal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100921/bfde632c/attachment-0001.htm>


More information about the mpich-discuss mailing list