Problem opening a file under OpenMPI

John Michalakes john.michalakes at noaa.gov
Fri Jul 10 13:27:28 CDT 2015


Hi,
 
Having a problem where an MPI Fortran program (WRF) can open a file for
writing using NFMPI_CREATE when all tasks are on one node, but fails if the
tasks are spread over multiple nodes.  Have isolated to a small test
program:
 
Program hello
  implicit none
  include "mpif.h"
#include "pnetcdf.inc"
  integer                           :: stat,Status
  integer                           :: info, ierr
  integer Comm
  integer ncid
 
  CALL MPI_INIT( ierr )
  Comm = MPI_COMM_WORLD
  call mpi_info_create( info, ierr )
  CALL mpi_info_set(info,"romio_ds_write","disable", ierr) ;
write(0,*)'mpi_info_set write returns ',ierr
  CALL mpi_info_set(info,"romio_ds_read","disable", ierr) ;
write(0,*)'mpi_info_set read returns ',ierr
  stat = NFMPI_CREATE(Comm, 'testfile_d01', IOR(NF_CLOBBER,
NF_64BIT_OFFSET), info, NCID)
write(0,*)'after NFMPI_CREATE ', stat
  call mpi_info_free( info, ierr )
  stat = NFMPI_CLOSE(NCID)
write(0,*)'after NFMPI_CLOSE ', stat
  CALL MPI_FINALIZE( ierr )
  STOP
End Program hello
Running with two tasks on a single node this generates:
 
a515
a515
mpi_info_set write returns            0
mpi_info_set read returns            0
mpi_info_set write returns            0
mpi_info_set read returns            0
after NFMPI_CREATE            0
after NFMPI_CREATE            0
after NFMPI_CLOSE            0
after NFMPI_CLOSE            0
 
But running with 2 tasks, each on a separate node:
 
a811
a817
mpi_info_set write returns            0
mpi_info_set read returns            0
mpi_info_set write returns            0
mpi_info_set read returns            0
after NFMPI_CREATE         -208   <<<<<<<<<<<<<<
after NFMPI_CLOSE          -33
after NFMPI_CREATE         -208  <<<<<<<<<<<<<<
after NFMPI_CLOSE          -33
 
I have tested the program on other systems such as NCAR's Yellowstone and it
works fine on any combination of nodes.  This target system is a user's
system running openmpi/1.6.3 compiled for intel.  The version of pnetcdf is
1.3.1.  I'm pretty sure it's a Lustre file system (but will have to follow
up with the user and their support staff to be sure).
 
I'm assuming there's some misconfiguration or installation of MPI or pNetCDF
on the user's system, but I need some help with how to proceed.  Thanks,
 
John 
 
John Michalakes
Scientific Programmer/Analyst
National Centers for Environmental Prediction
john.michalakes at noaa.gov
301-683-3847
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150710/3da9ba74/attachment-0001.html>


More information about the parallel-netcdf mailing list