[mpich-discuss] ROMIO: Patch to run on Lustre

Pascal Deveze Pascal.Deveze at bull.net
Fri Sep 17 06:17:22 CDT 2010


Hi,

I get an error when I run ROMIO on Lustre with a high number of stripes 
and processes.

This is due to the call to ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void 
*)lum) that
return -1 with errno pointing to " Value too large for defined data type".
In that case, fd->hints->striping_factor is not initialize. That could 
produce a "Floating point exception"
"Integer divide-by-zero" in ADIOI_LUSTRE_Get_striping_info().

After initializing lum->lmm_stripe_count to a "correct value", this 
problem disappears.
I think this is a Lustre bug, but I propose to integer this patch:

--- src/mpi/romio/adio/ad_lustre/ad_lustre_open.c       2010-09-17 
12:50:58.000000000 +0200
+++ src/mpi/romio/adio/ad_lustre/ad_lustre_open.c.OLD   2010-05-25 
20:59:13.000000000 +0200
@@ -59,9 +59,6 @@
                  MAX_LOV_UUID_COUNT * sizeof(struct lov_user_ost_data);
         lum = (struct lov_user_md *)ADIOI_Malloc(lumlen);
         lum->lmm_magic = LOV_USER_MAGIC;
-       /* Initialize lum->lmm_stripe_count with a value else ioctl() 
returns an error */
-       /* This value must be greater or egal than the existing 
lmm_stripe_count (bug in Lustre ?) */
-        lum->lmm_stripe_count = -1;
         err = ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void *)lum);
         if (!err) {
             value = (char *) 
ADIOI_Malloc((MPI_MAX_INFO_VAL+1)*sizeof(char));

Pascal



More information about the mpich-discuss mailing list