<div dir="ltr">Hello Wei-Keng,<div> Sorry for the slow turn around on this test. Our computing resources have been down all week and just came back. Openmpi succeeded, but intel-mpi failed with the following error.</div><div><br></div><div><p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[proxy:0:0@gr1224.localdomain] HYD_pmcd_pmi_args_to_tokens (../../pm/pmiserv/common.c:276): assert (*count * sizeof(struct HYD_pmcd_token)) failed</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[proxy:0:0@gr1224.localdomain] fn_job_getid (../../pm/pmiserv/pmip_pmi_v2.c:253): unable to convert args to tokens</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[proxy:0:0@gr1224.localdomain] pmi_cb (../../pm/pmiserv/pmip_cb.c:806): PMI handler returned error</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[proxy:0:0@gr1224.localdomain] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[proxy:0:0@gr1224.localdomain] main (../../pm/pmiserv/pmip.c:507): demux engine error waiting for event</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[mpiexec@gr1224.localdomain] control_cb (../../pm/pmiserv/pmiserv_cb.c:781): connection to proxy 0 at host gr1224 failed</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[mpiexec@gr1224.localdomain] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[mpiexec@gr1224.localdomain] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:500): error waiting for event</span></p>
<p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures">[mpiexec@gr1224.localdomain] main (../../ui/mpich/mpiexec.c:1130): process manager error waiting for completion</span></p><p style="margin:0px;font-size:14px;line-height:normal;font-family:monaco;color:rgb(245,245,245);background-color:rgb(30,30,30)"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p></div><div>Does this mean that our intel-mpi implementation has an issue(s)?</div><div>Regards,</div><div>Luke</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 24, 2016 at 11:02 PM, parallel-netcdf <span dir="ltr"><<a href="mailto:parallel-netcdf@mcs.anl.gov" target="_blank">parallel-netcdf@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">#21: File system locking error in testing<br>
------------------------------<wbr>--------+---------------------<wbr>----------------<br>
Reporter: luke.vanroekel@… | Owner: robl<br>
Type: test error | Status: new<br>
Priority: major | Milestone:<br>
Component: parallel-netcdf | Version: 1.7.0<br>
Keywords: |<br>
------------------------------<wbr>--------+---------------------<wbr>----------------<br>
<br>
Comment(by wkliao):<br>
<br>
Hi, Luke<br>
<br>
We just resolved an issue of trac notification email setting today. I<br>
believe from now on<br>
any update to the ticket you created should reach you through email.<br>
<br>
I assume you ran PnetCDF tests using Intel MPI and OpenMPI on the same<br>
machine<br>
accessing the same Lustre file system. If this is the case, I am also<br>
puzzled.<br>
If OpenMPI works, then it implies the Lustre directory is mounted with the<br>
'flock' option, which should have worked fine with Intel MPI. I would<br>
suggest you<br>
try a simple MPI-IO program below. If the same problem occurs, then it is<br>
an<br>
MPI-IO problem. Let me know.<br>
<br>
{{{<br>
#include <stdio.h><br>
#include <stdlib.h><br>
#include <mpi.h><br>
<br>
int main(int argc, char **argv) {<br>
int buf, err;<br>
MPI_File fh;<br>
MPI_Status status;<br>
<br>
MPI_Init(&argc, &argv);<br>
if (argc != 2) {<br>
printf("Usage: %s filename\n", argv[0]);<br>
MPI_Finalize();<br>
return 1;<br>
}<br>
err = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_CREATE |<br>
MPI_MODE_RDWR,<br>
MPI_INFO_NULL, &fh);<br>
if (err != MPI_SUCCESS) printf("Error: MPI_File_open()\n");<br>
<br>
err = MPI_File_write_all(fh, &buf, 1, MPI_INT, &status);<br>
if (err != MPI_SUCCESS) printf("Error: MPI_File_write_all()\n");<br>
<br>
MPI_File_close(&fh);<br>
MPI_Finalize();<br>
return 0;<br>
}<br>
}}}<br>
<br>
Wei-keng<br>
<br>
<br>
Replying to [ticket:24 luke.vanroekel@…]:<br>
> In trying to respond to the question raised about my ticket #21, I am<br>
unable to do so. I don't see any reply option<br>
> or modify ticket. Sorry for raising another ticket, but I cannot figure<br>
out how to respond to the previous question.<br>
><br>
> In regards to the question in Ticket 21, the flag is not set for<br>
locking. My confusion is why intel mpi requires file<br>
> locking while openmpi does not. Our hpc staff will not change settings<br>
on the mount. Is it possible to work<br>
> around the file-lock error?<br>
><br>
> Regards, Luke<br>
<br>
<br>
Replying to [ticket:21 luke.vanroekel@…]:<br>
> Hello,<br>
> I've been attempting to build parallel-netcdf for our local cluster<br>
with gcc and intel-mpi 5.1.3 and netcdf 4.3.2. The code compiles fine,<br>
but when I run make check testing, nc_test fails with the following error<br>
><br>
><br>
> {{{<br>
> This requires fcntl(2) to be implemented. As of 8/25/2011 it is not.<br>
Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd 3,cmd<br>
F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno<br>
26.<br>
> - If the file system is NFS, you need to use NFS version 3, ensure that<br>
the lockd daemon is running on all the machines, and mount the directory<br>
with the 'noac' option (no attribute caching).<br>
> - If the file system is LUSTRE, ensure that the directory is mounted<br>
with the 'flock' option.<br>
> ADIOI_Set_lock:: Function not implemented<br>
> ADIOI_Set_lock:offset 0, length 6076<br>
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0<br>
><br>
> }}}<br>
><br>
> I am running this test on a parallel file system (lustre). I have<br>
tested this in versions 1.5.0 up to the most current. Any thoughts? I<br>
can compile and test just fine with openmpi 1.10.3.<br>
><br>
> Regards,<br>
> Luke<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Ticket URL: <<a href="http://trac.mcs.anl.gov/projects/parallel-netcdf/ticket/21#comment:2" rel="noreferrer" target="_blank">http://trac.mcs.anl.gov/<wbr>projects/parallel-netcdf/<wbr>ticket/21#comment:2</a>><br>
parallel-netcdf <<a href="http://trac.mcs.anl.gov/projects/parallel-netcdf" rel="noreferrer" target="_blank">http://trac.mcs.anl.gov/<wbr>projects/parallel-netcdf</a>><br>
<br>
</font></span></blockquote></div><br></div>