MPI Failure at line 839 of nonblocking.c (MPI_File_write_all : MPI_ERR_IO: input/output error)
Wei-keng Liao
wkliao at ece.northwestern.edu
Mon Sep 24 16:30:29 CDT 2012
Hi, Rob,
Jim also mentioned his program hung after seeing this MPI-IO error message.
In the current pnetcdf, at line 839 of nonblocking.c, all MPI error
will make the calling functions return immediately. A few lines below
at line 850 is an MPI_File_set_view() which is collective and hence causes
Jim's program to hang. Maybe we should remove the return statement from
line 41 of macro.h, so the program can continue for a non-fatal MPI error
such as this one.
Wei-keng
On Sep 24, 2012, at 2:53 PM, Rob Latham wrote:
> On Wed, Aug 08, 2012 at 02:17:19PM -0600, Jim Edwards wrote:
>> I am getting this error from parallel-netcdf using openmpi 1.4.5 and intel
>> 12.1.4 and a lustre filesystem. Because this is
>> non-blocking I am having a lot of difficulty pinpointing the issue, do you
>> have any suggestions? I buffer multiple variables before
>> calling the nfmpi_wait_all and if I turn off this buffering functionality
>> it appears to work fine. All of this functionality works on several
>> other systems so I
>> think that it must be an issue lower in the software stack.
>
> Hi Jim. Sorry to resurrect this old thread, especially when there's
> not a lot of new information for you.
>
> Openmpi-1.5.2 (i think) contains a big ROMIO re-sync, including some
> Lustre collective I/O improvements: your hunch that the problem lies
> with a lower level in the software stack (the MPI-IO library) is
> entirely consistent with that observation.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list