[mpich-discuss] MPI_File_get_size error message
Eugene Loh
eugene.loh at oracle.com
Fri Oct 28 01:12:01 CDT 2011
I think this is a ROMIO question about ADIOI_Set_lock().
As part of our Open MPI testing, we run a test that checks MPI_Get_count
on a status from an MPI_File_read. (The test is
ibm/io/file_status_get_count.c, in case that means anything to you. If
not, don't worry.) In this test, each MPI process writes a file and
then opens it with MPI_File_open(MPI_COMM_SELF,...,MPI_MODE_RDONLY,
MPI_INFO_NULL, ...). It then does an MPI_File_get_size and some other
stuff. Occasionally, the test fails with:
File locking failed in ADIOI_Set_lock(fd A,cmd F_SETLKW/7,type
F_RDLCK/0,whence 0) with return value
FFFFFFFF and errno 5.
- If the file system is NFS, you need to use NFS version 3, ensure that
the lockd daemon is running
on all the machines, and mount the directory with the 'noac' option (no
attribute caching).
- If the file system is LUSTRE, ensure that the directory is mounted
with the 'flock' option.
ADIOI_Set_lock:: Input/output error
ADIOI_Set_lock:offset 0, length 1
If I take the error message at face value, I should check (in my case)
NFS. It's NFSv3 and it appears lockd is running. I'm not real sure if
noac is set, but I suspect it is not. But is that really the problem
here? If I look at ADIOI_Set_lock, a fcntl() failed. Is that
necessarily an indication of the NFS/Lustre conditions discussed in the
error message? Incidentally, errno 5 appears to be EIO, though I don't
know if that's any help.
Anyhow, regardless of whether noac is set or not, that setting is never
changed and yet the test usually passes for us and only occasionally fails.
Could the real issue be some other NFS hiccup, with the NFSv3/lockd/noac
verbiage being a red herring? Any other help/suggestions?
More information about the mpich-discuss
mailing list