<br><br><div class="gmail_quote">On Fri, May 11, 2012 at 9:43 AM, Rob Latham <span dir="ltr"><<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Thu, May 10, 2012 at 03:28:57PM -0600, Jim Edwards wrote:<br>
> This occurs on the ncsa machine bluewaters. I am using pnetcdf1.2.0 and<br>
> pgi 11.10.0<br>
<br>
</div>need one more bit of information: the version of MPT you are using.<br>
<div class="im"><br></div></blockquote><div>Sorry, what's mpt? MPI?<br><div style="margin-left:40px">Currently Loaded Modulefiles:<br> 1) modules/<a href="http://3.2.6.6">3.2.6.6</a> 9) user-paths 17) xpmem/0.1-2.0400.31280.3.1.gem<br>
2) xtpe-network-gemini 10) pgi/11.10.0 18) xe-sysroot/4.0.46<br><span style="background-color:rgb(255,255,0)"> 3) xt-mpich2/5.4.2 </span> 11) xt-libsci/11.0.04 19) xt-asyncpe/5.07<br>
4) xtpe-interlagos 12) udreg/2.3.1-1.0400.4264.3.1.gem 20) atp/1.4.1<br> 5) eswrap/1.0.12 13) ugni/2.3-1.0400.4374.4.88.gem 21) PrgEnv-pgi/4.0.46<br> 6) torque/2.5.10 14) pmi/3.0.0-1.0000.8661.28.2807.gem 22) hdf5-parallel/1.8.7<br>
7) moab/6.1.5 15) dmapp/3.2.1-1.0400.4255.2.159.gem 23) netcdf-hdf5parallel/4.1.3<br> 8) scripts 16) gni-headers/2.1-1.0400.4351.3.1.gem 24) parallel-netcdf/1.2.0<br>
</div><br><br><br><br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="im">
> The issue is that calling nfmpi_createfile would sometimes result in an<br>
> error:<br>
><br>
> MPI_File_open : Other I/O error , error stack:<br>
> (unknown)(): Other I/O error<br>
> 126: MPI_File_open : Other I/O error , error stack:<br>
> (unknown)(): Other I/O error<br>
> Error on create : 502 -32<br>
><br>
> The error appears to be intermittent and I could not get it to occur at all<br>
> on a small number of tasks (160) but it occurs with high frequency when<br>
> using a larger number of tasks (>=1600). I traced the problem to the use<br>
> of nf_clobber in the mode argument, removing the nf_clobber seems to have<br>
> solved the problem and I think that create implies clobber anyway doesn't<br>
> it?<br>
<br>
> Can someone who knows what is going on under the covers enlighten me<br>
> with some understanding of this issue? I suspect that one task is trying<br>
> to clobber the file that another has just created or something of that<br>
> nature.<br>
<br>
</div>Unfortunately, "under the covers" here means "inside the MPI-IO<br>
library", which we don't have access to.<br>
<br>
in the create case we call MPI_File_open with "MPI_MODE_RDWR |<br>
MPI_MODE_CREATE", and if noclobber set, we add MPI_MODE_EXCL.<br>
<br>
OK, so that's pnetcdf. What's going on in MPI-IO? Well, cray's based<br>
their MPI-IO off of our ROMIO, but I'm not sure which version.<br>
<br>
Let me cook up a quick MPI-IO-only test case you can run to trigger<br>
this problem and then you can beat cray over the head with it.<br>
<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br>Sounds good, thanks.<br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="HOEnZb"><font color="#888888">
==rob<br>
<br>
--<br>
Rob Latham<br>
Mathematics and Computer Science Division<br>
Argonne National Lab, IL USA<br>
</font></span></blockquote></div><br><br clear="all"><br>-- <br>Jim Edwards<br><br><font>CESM Software Engineering Group<br>National Center for Atmospheric Research<br>Boulder, CO <br>303-497-1842<br></font><br>