Problem opening a file under OpenMPI
John Michalakes
john.michalakes at noaa.gov
Fri Jul 10 15:14:39 CDT 2015
Hi Wei-keng and Jim,
The MPI-IO test program you works on both one and more than one node. I had
to add a call to MPI_File_close before the MPI_Finalize to get rid of some
extraneous errors related to shutdown:
[a514:22054] *** An error occurred in MPI_File_set_errhandler
[a514:22054] *** on a NULL communicator
[a514:22054] *** Unknown error
[a514:22054] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
But once I did that the output from the program is clean, no error messages.
I then downloaded and installed pNetCDF 1.6.1 on the user's machine and
tried my Fortran code again. Success!
So whatever the problem was, upgrading to pNetCDF 1.6.1 seems to have fixed
things. Thanks for your help.
John
-----Original Message-----
From: Wei-keng Liao [mailto:wkliao at eecs.northwestern.edu]
Sent: Friday, July 10, 2015 1:00 PM
To: John Michalakes
Cc: parallel-netcdf at lists.mcs.anl.gov
Subject: Re: Problem opening a file under OpenMPI
Hi, John
Can you try the following Fortran MPI program to see if you can create a
file?
Please use the same OpenMPI 1.6.3 compiler to test and maybe with a
different file path.
% cat mpi_open.f
program mpi_open
implicit none
include "mpif.h"
character(LEN=MPI_MAX_ERROR_STRING) err_string
integer err, ierr, err_len, errorclass, fp, omode
call MPI_INIT(err)
omode = IOR(MPI_MODE_RDWR, MPI_MODE_CREATE)
call MPI_File_open(MPI_COMM_WORLD, 'testfile_d01', omode,
+ MPI_INFO_NULL, fp, err)
if (err .NE. MPI_SUCCESS) then
call MPI_Error_class(err, errorclass, ierr)
call MPI_Error_string(err, err_string, err_len, ierr)
print*,'Error: MPI_File_open() ' , trim(err_string)
endif
call MPI_Finalize(err)
end
PnetCDF version 1.3.1 is old, released 3 years ago.
Error code -208 in 1.3.1 means "file open/creation failed".
When seeing this error code, OpenMPI should also report another error
message that provides more information. OpenMPI 1.6.3 is also kind of old,
also 3 years.
If the above test program ran without errors, I wonder if you can try the
latest PnetCDF 1.6.1 on that machine?
Wei-keng
On Jul 10, 2015, at 1:27 PM, John Michalakes wrote:
> Hi,
>
> Having a problem where an MPI Fortran program (WRF) can open a file for
writing using NFMPI_CREATE when all tasks are on one node, but fails if the
tasks are spread over multiple nodes. Have isolated to a small test
program:
>
> Program hello
> implicit none
> include "mpif.h"
> #include "pnetcdf.inc"
> integer :: stat,Status
> integer :: info, ierr
> integer Comm
> integer ncid
>
> CALL MPI_INIT( ierr )
> Comm = MPI_COMM_WORLD
> call mpi_info_create( info, ierr )
> CALL mpi_info_set(info,"romio_ds_write","disable", ierr) ;
> write(0,*)'mpi_info_set write returns ',ierr
> CALL mpi_info_set(info,"romio_ds_read","disable", ierr) ;
> write(0,*)'mpi_info_set read returns ',ierr
> stat = NFMPI_CREATE(Comm, 'testfile_d01', IOR(NF_CLOBBER,
NF_64BIT_OFFSET), info, NCID)
> write(0,*)'after NFMPI_CREATE ', stat
> call mpi_info_free( info, ierr )
> stat = NFMPI_CLOSE(NCID)
> write(0,*)'after NFMPI_CLOSE ', stat
> CALL MPI_FINALIZE( ierr )
> STOP
> End Program hello
> Running with two tasks on a single node this generates:
>
> a515
> a515
> mpi_info_set write returns 0
> mpi_info_set read returns 0
> mpi_info_set write returns 0
> mpi_info_set read returns 0
> after NFMPI_CREATE 0
> after NFMPI_CREATE 0
> after NFMPI_CLOSE 0
> after NFMPI_CLOSE 0
>
> But running with 2 tasks, each on a separate node:
>
> a811
> a817
> mpi_info_set write returns 0
> mpi_info_set read returns 0
> mpi_info_set write returns 0
> mpi_info_set read returns 0
> after NFMPI_CREATE -208 <<<<<<<<<<<<<<
> after NFMPI_CLOSE -33
> after NFMPI_CREATE -208 <<<<<<<<<<<<<<
> after NFMPI_CLOSE -33
>
> I have tested the program on other systems such as NCAR's Yellowstone and
it works fine on any combination of nodes. This target system is a user's
system running openmpi/1.6.3 compiled for intel. The version of pnetcdf is
1.3.1. I'm pretty sure it's aLustre file system (but will have to follow up
with the user and their support staff to be sure).
>
> I'm assuming there's some misconfiguration or installation of MPI or
pNetCDF on the user's system, but I need some help with how to proceed.
Thanks,
>
> John
>
> John Michalakes
> Scientific Programmer/Analyst
> National Centers for Environmental Prediction
> john.michalakes at noaa.gov
> 301-683-3847
>
More information about the parallel-netcdf
mailing list