Problem opening a file under OpenMPI

Wei-keng Liao wkliao at eecs.northwestern.edu
Fri Jul 10 14:00:13 CDT 2015


Hi, John

Can you try the following Fortran MPI program to see if you can create a file?
Please use the same OpenMPI 1.6.3 compiler to test and maybe with a different file path.

% cat mpi_open.f
      program mpi_open
      implicit none
      include "mpif.h"

      character(LEN=MPI_MAX_ERROR_STRING) err_string
      integer err, ierr, err_len, errorclass, fp, omode

      call MPI_INIT(err)
      
      omode = IOR(MPI_MODE_RDWR, MPI_MODE_CREATE)
      call MPI_File_open(MPI_COMM_WORLD, 'testfile_d01', omode,
     +                   MPI_INFO_NULL, fp, err)
      if (err .NE. MPI_SUCCESS) then
          call MPI_Error_class(err, errorclass, ierr)
          call MPI_Error_string(err, err_string, err_len, ierr)
          print*,'Error: MPI_File_open() ' , trim(err_string)
      endif

      call MPI_Finalize(err)
      end

PnetCDF version 1.3.1 is old, released 3 years ago.
Error code -208 in 1.3.1 means "file open/creation failed".

When seeing this error code, OpenMPI should also report another error message that provides more information. OpenMPI 1.6.3 is also kind of old, also 3 years.

If the above test program ran without errors, I wonder if you can try the latest PnetCDF 1.6.1 on that machine?


Wei-keng

On Jul 10, 2015, at 1:27 PM, John Michalakes wrote:

> Hi,
>  
> Having a problem where an MPI Fortran program (WRF) can open a file for writing using NFMPI_CREATE when all tasks are on one node, but fails if the tasks are spread over multiple nodes.  Have isolated to a small test program:
>  
> Program hello
>   implicit none
>   include "mpif.h"
> #include "pnetcdf.inc"
>   integer                           :: stat,Status
>   integer                           :: info, ierr
>   integer Comm
>   integer ncid
>  
>   CALL MPI_INIT( ierr )
>   Comm = MPI_COMM_WORLD
>   call mpi_info_create( info, ierr )
>   CALL mpi_info_set(info,"romio_ds_write","disable", ierr) ;
> write(0,*)'mpi_info_set write returns ',ierr
>   CALL mpi_info_set(info,"romio_ds_read","disable", ierr) ;
> write(0,*)'mpi_info_set read returns ',ierr
>   stat = NFMPI_CREATE(Comm, 'testfile_d01', IOR(NF_CLOBBER, NF_64BIT_OFFSET), info, NCID)
> write(0,*)'after NFMPI_CREATE ', stat
>   call mpi_info_free( info, ierr )
>   stat = NFMPI_CLOSE(NCID)
> write(0,*)'after NFMPI_CLOSE ', stat
>   CALL MPI_FINALIZE( ierr )
>   STOP
> End Program hello
> Running with two tasks on a single node this generates:
>  
> a515
> a515
> mpi_info_set write returns            0
> mpi_info_set read returns            0
> mpi_info_set write returns            0
> mpi_info_set read returns            0
> after NFMPI_CREATE            0
> after NFMPI_CREATE            0
> after NFMPI_CLOSE            0
> after NFMPI_CLOSE            0
>  
> But running with 2 tasks, each on a separate node:
>  
> a811
> a817
> mpi_info_set write returns            0
> mpi_info_set read returns            0
> mpi_info_set write returns            0
> mpi_info_set read returns            0
> after NFMPI_CREATE         -208   <<<<<<<<<<<<<<
> after NFMPI_CLOSE          -33
> after NFMPI_CREATE         -208  <<<<<<<<<<<<<<
> after NFMPI_CLOSE          -33
>  
> I have tested the program on other systems such as NCAR’s Yellowstone and it works fine on any combination of nodes.  This target system is a user’s system running openmpi/1.6.3 compiled for intel.  The version of pnetcdf is 1.3.1.  I’m pretty sure it’s aLustre file system (but will have to follow up with the user and their support staff to be sure).
>  
> I’m assuming there’s some misconfiguration or installation of MPI or pNetCDF on the user’s system, but I need some help with how to proceed.  Thanks,
>  
> John
>  
> John Michalakes
> Scientific Programmer/Analyst
> National Centers for Environmental Prediction
> john.michalakes at noaa.gov
> 301-683-3847
>  



More information about the parallel-netcdf mailing list