Error when leaving the define mode

Latham, Robert J. robl at mcs.anl.gov
Wed Jul 14 16:42:42 CDT 2021


On Wed, 2021-07-14 at 22:00 +0800, Jin-De Huang wrote:
> I am testing my model with 2304 processes on a supercluster with the
> Fujitsu Fortran compiler and Pnetcdf 1.12.1. The model halted when
> leaving the define mode. The error message only appeared in the log
> files that MPI ranks are greater than 2047.
> 
> MPI error (MPI_File_read_at_all) : MPI_ERR_ARG: invalid argument of
> some other kind
> 
> Some problems happened in these processes, but the error codes from
> each Pnetcdf function were 0 until the above error message appeared.
> As I used the number of processes less than 2048, the model worked
> normally. I have no idea to solve this problem. Is it any way to
> identify the reason for this problem?

My first guess might be a "too many open files" problem, though I would
have hoped the MPI-IO implementation would have said that instead of
"some error happened".

If it is open files, then there is a 'ulimit' setting you can
raise:  `ulimit -a` will show you what limits are in place now, and
`ulimit -n` changes the "open files" limit.  Try doubling whatever it
is set to now.

==rob


More information about the parallel-netcdf mailing list