Problem opening a file under OpenMPI
Jim Edwards
jedwards at ucar.edu
Fri Jul 10 13:55:29 CDT 2015
John,
I think that the latest pnetcdf version is 1.6.1 - I know that there was at
least one change specifically for openmpi
pnetcdf is a very easy build, I recommend trying a newer version.
Jim
On Fri, Jul 10, 2015 at 12:27 PM, John Michalakes <john.michalakes at noaa.gov>
wrote:
> Hi,
>
>
>
> Having a problem where an MPI Fortran program (WRF) can open a file for
> writing using NFMPI_CREATE when all tasks are on one node, but fails if
> the tasks are spread over multiple nodes. Have isolated to a small test
> program:
>
>
>
> Program hello
>
> implicit none
>
> include "mpif.h"
>
> #include "pnetcdf.inc"
>
> integer :: stat,Status
>
> integer :: info, ierr
>
> integer Comm
>
> integer ncid
>
>
>
> CALL MPI_INIT( ierr )
>
> Comm = MPI_COMM_WORLD
>
> call mpi_info_create( info, ierr )
>
> CALL mpi_info_set(info,"romio_ds_write","disable", ierr) ;
>
> write(0,*)'mpi_info_set write returns ',ierr
>
> CALL mpi_info_set(info,"romio_ds_read","disable", ierr) ;
>
> write(0,*)'mpi_info_set read returns ',ierr
>
> stat = NFMPI_CREATE(Comm, 'testfile_d01', IOR(NF_CLOBBER,
> NF_64BIT_OFFSET), info, NCID)
>
> write(0,*)'after NFMPI_CREATE ', stat
>
> call mpi_info_free( info, ierr )
>
> stat = NFMPI_CLOSE(NCID)
>
> write(0,*)'after NFMPI_CLOSE ', stat
>
> CALL MPI_FINALIZE( ierr )
>
> STOP
>
> End Program hello
>
> Running with two tasks on a single node this generates:
>
>
>
> a515
>
> a515
>
> mpi_info_set write returns 0
>
> mpi_info_set read returns 0
>
> mpi_info_set write returns 0
>
> mpi_info_set read returns 0
>
> after NFMPI_CREATE 0
>
> after NFMPI_CREATE 0
>
> after NFMPI_CLOSE 0
>
> after NFMPI_CLOSE 0
>
>
>
> But running with 2 tasks, each on a separate node:
>
>
>
> a811
>
> a817
>
> mpi_info_set write returns 0
>
> mpi_info_set read returns 0
>
> mpi_info_set write returns 0
>
> mpi_info_set read returns 0
>
> after NFMPI_CREATE -208 <<<<<<<<<<<<<<
>
> after NFMPI_CLOSE -33
>
> after NFMPI_CREATE -208 <<<<<<<<<<<<<<
>
> after NFMPI_CLOSE -33
>
>
>
> I have tested the program on other systems such as NCAR’s Yellowstone and
> it works fine on any combination of nodes. This target system is a
> user’s system running openmpi/1.6.3 compiled for intel. The version of
> pnetcdf is 1.3.1. I’m pretty sure it’s a Lustre file system (but will
> have to follow up with the user and their support staff to be sure).
>
>
>
> I’m assuming there’s some misconfiguration or installation of MPI or
> pNetCDF on the user’s system, but I need some help with how to proceed.
> Thanks,
>
>
>
> John
>
>
>
>
> *John MichalakesScientific Programmer/Analyst*
>
> *National Centers for Environmental Prediction*
>
> *john.michalakes at noaa.gov <john.michalakes at noaa.gov>*
>
> *301-683-3847 <301-683-3847>*
>
>
>
--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150710/b1df9aba/attachment.html>
More information about the parallel-netcdf
mailing list