Problem opening a file under OpenMPI

Jim Edwards jedwards at ucar.edu
Fri Jul 10 13:55:29 CDT 2015


John,

I think that the latest pnetcdf version is 1.6.1 - I know that there was at
least one change specifically for openmpi

pnetcdf is a very easy build, I recommend trying a newer version.

Jim

On Fri, Jul 10, 2015 at 12:27 PM, John Michalakes <john.michalakes at noaa.gov>
wrote:

> Hi,
>
>
>
> Having a problem where an MPI Fortran program (WRF) can open a file for
> writing using NFMPI_CREATE when all tasks are on one node, but fails if
> the tasks are spread over multiple nodes.  Have isolated to a small test
> program:
>
>
>
> Program hello
>
>   implicit none
>
>   include "mpif.h"
>
> #include "pnetcdf.inc"
>
>   integer                           :: stat,Status
>
>   integer                           :: info, ierr
>
>   integer Comm
>
>   integer ncid
>
>
>
>   CALL MPI_INIT( ierr )
>
>   Comm = MPI_COMM_WORLD
>
>   call mpi_info_create( info, ierr )
>
>   CALL mpi_info_set(info,"romio_ds_write","disable", ierr) ;
>
> write(0,*)'mpi_info_set write returns ',ierr
>
>   CALL mpi_info_set(info,"romio_ds_read","disable", ierr) ;
>
> write(0,*)'mpi_info_set read returns ',ierr
>
>   stat = NFMPI_CREATE(Comm, 'testfile_d01', IOR(NF_CLOBBER,
> NF_64BIT_OFFSET), info, NCID)
>
> write(0,*)'after NFMPI_CREATE ', stat
>
>   call mpi_info_free( info, ierr )
>
>   stat = NFMPI_CLOSE(NCID)
>
> write(0,*)'after NFMPI_CLOSE ', stat
>
>   CALL MPI_FINALIZE( ierr )
>
>   STOP
>
> End Program hello
>
> Running with two tasks on a single node this generates:
>
>
>
> a515
>
> a515
>
> mpi_info_set write returns            0
>
> mpi_info_set read returns            0
>
> mpi_info_set write returns            0
>
> mpi_info_set read returns            0
>
> after NFMPI_CREATE            0
>
> after NFMPI_CREATE            0
>
> after NFMPI_CLOSE            0
>
> after NFMPI_CLOSE            0
>
>
>
> But running with 2 tasks, each on a separate node:
>
>
>
> a811
>
> a817
>
> mpi_info_set write returns            0
>
> mpi_info_set read returns            0
>
> mpi_info_set write returns            0
>
> mpi_info_set read returns            0
>
> after NFMPI_CREATE         -208   <<<<<<<<<<<<<<
>
> after NFMPI_CLOSE          -33
>
> after NFMPI_CREATE         -208  <<<<<<<<<<<<<<
>
> after NFMPI_CLOSE          -33
>
>
>
> I have tested the program on other systems such as NCAR’s Yellowstone and
> it works fine on any combination of nodes.  This target system is a
> user’s system running openmpi/1.6.3 compiled for intel.  The version of
> pnetcdf is 1.3.1.  I’m pretty sure it’s a Lustre file system (but will
> have to follow up with the user and their support staff to be sure).
>
>
>
> I’m assuming there’s some misconfiguration or installation of MPI or
> pNetCDF on the user’s system, but I need some help with how to proceed.
> Thanks,
>
>
>
> John
>
>
>
>
> *John MichalakesScientific Programmer/Analyst*
>
> *National Centers for Environmental Prediction*
>
> *john.michalakes at noaa.gov <john.michalakes at noaa.gov>*
>
> *301-683-3847 <301-683-3847>*
>
>
>



-- 
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150710/b1df9aba/attachment.html>


More information about the parallel-netcdf mailing list