Error when leaving the define mode
Latham, Robert J.
robl at mcs.anl.gov
Wed Jul 14 16:42:42 CDT 2021
On Wed, 2021-07-14 at 22:00 +0800, Jin-De Huang wrote:
> I am testing my model with 2304 processes on a supercluster with the
> Fujitsu Fortran compiler and Pnetcdf 1.12.1. The model halted when
> leaving the define mode. The error message only appeared in the log
> files that MPI ranks are greater than 2047.
>
> MPI error (MPI_File_read_at_all) : MPI_ERR_ARG: invalid argument of
> some other kind
>
> Some problems happened in these processes, but the error codes from
> each Pnetcdf function were 0 until the above error message appeared.
> As I used the number of processes less than 2048, the model worked
> normally. I have no idea to solve this problem. Is it any way to
> identify the reason for this problem?
My first guess might be a "too many open files" problem, though I would
have hoped the MPI-IO implementation would have said that instead of
"some error happened".
If it is open files, then there is a 'ulimit' setting you can
raise: `ulimit -a` will show you what limits are in place now, and
`ulimit -n` changes the "open files" limit. Try doubling whatever it
is set to now.
==rob
More information about the parallel-netcdf
mailing list