[mpich-discuss] ROMIO: Patch to run on Lustre
Pascal Deveze
Pascal.Deveze at bull.net
Fri Sep 17 06:17:22 CDT 2010
Hi,
I get an error when I run ROMIO on Lustre with a high number of stripes
and processes.
This is due to the call to ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void
*)lum) that
return -1 with errno pointing to " Value too large for defined data type".
In that case, fd->hints->striping_factor is not initialize. That could
produce a "Floating point exception"
"Integer divide-by-zero" in ADIOI_LUSTRE_Get_striping_info().
After initializing lum->lmm_stripe_count to a "correct value", this
problem disappears.
I think this is a Lustre bug, but I propose to integer this patch:
--- src/mpi/romio/adio/ad_lustre/ad_lustre_open.c 2010-09-17
12:50:58.000000000 +0200
+++ src/mpi/romio/adio/ad_lustre/ad_lustre_open.c.OLD 2010-05-25
20:59:13.000000000 +0200
@@ -59,9 +59,6 @@
MAX_LOV_UUID_COUNT * sizeof(struct lov_user_ost_data);
lum = (struct lov_user_md *)ADIOI_Malloc(lumlen);
lum->lmm_magic = LOV_USER_MAGIC;
- /* Initialize lum->lmm_stripe_count with a value else ioctl()
returns an error */
- /* This value must be greater or egal than the existing
lmm_stripe_count (bug in Lustre ?) */
- lum->lmm_stripe_count = -1;
err = ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void *)lum);
if (!err) {
value = (char *)
ADIOI_Malloc((MPI_MAX_INFO_VAL+1)*sizeof(char));
Pascal
More information about the mpich-discuss
mailing list