[parallel-netcdf] #21: File system locking error in testing

Tue Nov 1 09:33:55 CDT 2016

On 10/31/2016 10:01 AM, Wei-keng Liao wrote:
> Hi, Luke
>
> If the output Lustre folder is the same for both runs built by
> Intel MPI and OpenMPI runs, then I would say most likely the
> Intel MPI configuration is not done correctly. I suggest you
> replort this error to your system admin with the simple MPI
> program I provided earlier.
>
> If you like, you can also post this to mpich discuss mailing
> list: <discuss at mpich.org>. Rob Latham is the lead developer
> of ROMIO (MPICH's MPI-IO component). He and others in MPICH team
> may provide more information.

I agree with Wei-keng: When you get this error message, it's not becasue 
the OS iterrupted the process, or because something else held the lock 
or because you tried to lock something that wasn't a file:

...
while (err && ((errno == EINTR) || ((errno == EINPROGRESS) && 
(++err_count < 10000))));
     if (err && (errno != EBADF))
...

The oddly-verbose error message is because every few years a ROMIO 
developer would run on Lustre, have trouble with fcntl locks, eventually 
figure it out and then add another sentence to the logging message.   As 
you've seen these kinds of bugs (where the issue is with how the FS is 
deployed and configured vs a bug in ROMIO itself) are really annoying!

Thanks for confirming OpenMPI runs fine.  In recent versions of OpenMPI 
they are running with their own re-implementation of the MPI-IO 
routines.  You can request ROMIO's implementation and see the bug, 
though I forget the specific MCA option for that.

==rob

>
> Wei-keng
>
> On Oct 31, 2016, at 9:45 AM, Luke Van Roekel wrote:
>
>> Wei-keng,
>>   You were right about the mismatch.  With the fix, I know get the same ADIOI_set_lock error as in my first submission.  With openmpi the program runs fine.
>>
>> Regards,
>> Luke
>>
>> On Sun, Oct 30, 2016 at 10:28 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
>> Hi, Luke
>>
>> The error message could be caused by using a mpiexec/mpirun that is
>> not of the same build as mpicc used to compile the MPI program.
>> Could you check the path of mpiexec/mpirun to see whether it is in the
>> same folder as the Intel mpicc? However, this dose not seem to relate
>> to the ADIOI_Set_lock problem you first reported. But do let me know
>> if you get the above mpirun issue resolved and then we can check the lock
>> problem after.
>>
>> Wei-keng
>>
>> On Oct 28, 2016, at 10:23 PM, Luke Van Roekel wrote:
>>
>>> Hello Wei-Keng,
>>>   Sorry for the slow turn around on this test.  Our computing resources have been down all week and just came back.  Openmpi succeeded, but intel-mpi failed with the following error.
>>>
>>> [proxy:0:0 at gr1224.localdomain] HYD_pmcd_pmi_args_to_tokens (../../pm/pmiserv/common.c:276): assert (*count * sizeof(struct HYD_pmcd_token)) failed
>>> [proxy:0:0 at gr1224.localdomain] fn_job_getid (../../pm/pmiserv/pmip_pmi_v2.c:253): unable to convert args to tokens
>>> [proxy:0:0 at gr1224.localdomain] pmi_cb (../../pm/pmiserv/pmip_cb.c:806): PMI handler returned error
>>> [proxy:0:0 at gr1224.localdomain] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
>>> [proxy:0:0 at gr1224.localdomain] main (../../pm/pmiserv/pmip.c:507): demux engine error waiting for event
>>> [mpiexec at gr1224.localdomain] control_cb (../../pm/pmiserv/pmiserv_cb.c:781): connection to proxy 0 at host gr1224 failed
>>> [mpiexec at gr1224.localdomain] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
>>> [mpiexec at gr1224.localdomain] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:500): error waiting for event
>>> [mpiexec at gr1224.localdomain] main (../../ui/mpich/mpiexec.c:1130): process manager error waiting for completion
>>>
>>> Does this mean that our intel-mpi implementation has an issue(s)?
>>> Regards,
>>> Luke
>>>
>>> On Mon, Oct 24, 2016 at 11:02 PM, parallel-netcdf <parallel-netcdf at mcs.anl.gov> wrote:
>>> #21: File system locking error in testing
>>> --------------------------------------+-------------------------------------
>>>  Reporter:  luke.vanroekel@…          |       Owner:  robl
>>>      Type:  test error                |      Status:  new
>>>  Priority:  major                     |   Milestone:
>>> Component:  parallel-netcdf           |     Version:  1.7.0
>>>  Keywords:                            |
>>> --------------------------------------+-------------------------------------
>>>
>>> Comment(by wkliao):
>>>
>>>  Hi, Luke
>>>
>>>  We just resolved an issue of trac notification email setting today. I
>>>  believe from now on
>>>  any update to the ticket you created should reach you through email.
>>>
>>>  I assume you ran PnetCDF tests using Intel MPI and OpenMPI on the same
>>>  machine
>>>  accessing the same Lustre file system. If this is the case, I am also
>>>  puzzled.
>>>  If OpenMPI works, then it implies the Lustre directory is mounted with the
>>>  'flock' option, which should have worked fine with Intel MPI. I would
>>>  suggest you
>>>  try a simple MPI-IO program below. If the same problem occurs, then it is
>>>  an
>>>  MPI-IO problem. Let me know.
>>>
>>>  {{{
>>>  #include <stdio.h>
>>>  #include <stdlib.h>
>>>  #include <mpi.h>
>>>
>>>  int main(int argc, char **argv) {
>>>      int            buf, err;
>>>      MPI_File       fh;
>>>      MPI_Status     status;
>>>
>>>      MPI_Init(&argc, &argv);
>>>      if (argc != 2) {
>>>          printf("Usage: %s filename\n", argv[0]);
>>>          MPI_Finalize();
>>>          return 1;
>>>      }
>>>      err = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_CREATE |
>>>  MPI_MODE_RDWR,
>>>                          MPI_INFO_NULL, &fh);
>>>      if (err != MPI_SUCCESS) printf("Error: MPI_File_open()\n");
>>>
>>>      err = MPI_File_write_all(fh, &buf, 1, MPI_INT, &status);
>>>      if (err != MPI_SUCCESS) printf("Error: MPI_File_write_all()\n");
>>>
>>>      MPI_File_close(&fh);
>>>      MPI_Finalize();
>>>      return 0;
>>>  }
>>>  }}}
>>>
>>>  Wei-keng
>>>
>>>
>>>  Replying to [ticket:24 luke.vanroekel@…]:
>>>  > In trying to respond to the question raised about my ticket #21, I am
>>>  unable to do so. I don't see any reply option
>>>  > or modify ticket. Sorry for raising another ticket, but I cannot figure
>>>  out how to respond to the previous question.
>>>  >
>>>  > In regards to the question in Ticket 21, the flag is not set for
>>>  locking. My confusion is why intel mpi requires file
>>>  > locking while openmpi does not. Our hpc staff will not change settings
>>>  on the mount. Is it possible to work
>>>  > around the file-lock error?
>>>  >
>>>  > Regards, Luke
>>>
>>>
>>>  Replying to [ticket:21 luke.vanroekel@…]:
>>>  > Hello,
>>>  >   I've been attempting to build parallel-netcdf for our local cluster
>>>  with gcc and intel-mpi 5.1.3 and netcdf 4.3.2.  The code compiles fine,
>>>  but when I run make check testing, nc_test fails with the following error
>>>  >
>>>  >
>>>  > {{{
>>>  > This requires fcntl(2) to be implemented. As of 8/25/2011 it is not.
>>>  Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd 3,cmd
>>>  F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno
>>>  26.
>>>  > - If the file system is NFS, you need to use NFS version 3, ensure that
>>>  the lockd daemon is running on all the machines, and mount the directory
>>>  with the 'noac' option (no attribute caching).
>>>  > - If the file system is LUSTRE, ensure that the directory is mounted
>>>  with the 'flock' option.
>>>  > ADIOI_Set_lock:: Function not implemented
>>>  > ADIOI_Set_lock:offset 0, length 6076
>>>  > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>  >
>>>  > }}}
>>>  >
>>>  > I am running this test on a parallel file system (lustre).  I have
>>>  tested this in versions 1.5.0 up to the most current.  Any thoughts?  I
>>>  can compile and test just fine with openmpi 1.10.3.
>>>  >
>>>  > Regards,
>>>  > Luke
>>>
>>> --
>>> Ticket URL: <http://trac.mcs.anl.gov/projects/parallel-netcdf/ticket/21#comment:2>
>>> parallel-netcdf <http://trac.mcs.anl.gov/projects/parallel-netcdf>
>>>
>>>
>>
>>
>