[mpich-discuss] MPI_File_get_size error message

Eugene Loh eugene.loh at oracle.com
Fri Oct 28 01:12:01 CDT 2011


I think this is a ROMIO question about ADIOI_Set_lock().

As part of our Open MPI testing, we run a test that checks MPI_Get_count 
on a status from an MPI_File_read.  (The test is 
ibm/io/file_status_get_count.c, in case that means anything to you.  If 
not, don't worry.)  In this test, each MPI process writes a file and 
then opens it with MPI_File_open(MPI_COMM_SELF,...,MPI_MODE_RDONLY, 
MPI_INFO_NULL, ...).  It then does an MPI_File_get_size and some other 
stuff.  Occasionally, the test fails with:

File locking failed in ADIOI_Set_lock(fd A,cmd F_SETLKW/7,type 
F_RDLCK/0,whence 0) with return value
FFFFFFFF and errno 5.
- If the file system is NFS, you need to use NFS version 3, ensure that 
the lockd daemon is running
on all the machines, and mount the directory with the 'noac' option (no 
attribute caching).
- If the file system is LUSTRE, ensure that the directory is mounted 
with the 'flock' option.
ADIOI_Set_lock:: Input/output error
ADIOI_Set_lock:offset 0, length 1

If I take the error message at face value, I should check (in my case) 
NFS.  It's NFSv3 and it appears lockd is running.  I'm not real sure if 
noac is set, but I suspect it is not.  But is that really the problem 
here?  If I look at ADIOI_Set_lock, a fcntl() failed.  Is that 
necessarily an indication of the NFS/Lustre conditions discussed in the 
error message?  Incidentally, errno 5 appears to be EIO, though I don't 
know if that's any help.

Anyhow, regardless of whether noac is set or not, that setting is never 
changed and yet the test usually passes for us and only occasionally fails.

Could the real issue be some other NFS hiccup, with the NFSv3/lockd/noac 
verbiage being a red herring?  Any other help/suggestions?


More information about the mpich-discuss mailing list