Error when leaving the define mode
Wei-Keng Liao
wkliao at northwestern.edu
Wed Jul 14 17:21:01 CDT 2021
Hi, Jin-De
You can turn on the "safe mode" by setting the environment
variable PNETCDF_SAFE_MODE to 1.
This mode will check the consistency of arguments passed
to all PnetCDF functions. It will print out more error messages
that may be related to the error you are seeing.
Wei-keng
> On Jul 14, 2021, at 4:42 PM, Latham, Robert J. <robl at mcs.anl.gov> wrote:
>
> On Wed, 2021-07-14 at 22:00 +0800, Jin-De Huang wrote:
>> I am testing my model with 2304 processes on a supercluster with the
>> Fujitsu Fortran compiler and Pnetcdf 1.12.1. The model halted when
>> leaving the define mode. The error message only appeared in the log
>> files that MPI ranks are greater than 2047.
>>
>> MPI error (MPI_File_read_at_all) : MPI_ERR_ARG: invalid argument of
>> some other kind
>>
>> Some problems happened in these processes, but the error codes from
>> each Pnetcdf function were 0 until the above error message appeared.
>> As I used the number of processes less than 2048, the model worked
>> normally. I have no idea to solve this problem. Is it any way to
>> identify the reason for this problem?
>
> My first guess might be a "too many open files" problem, though I would
> have hoped the MPI-IO implementation would have said that instead of
> "some error happened".
>
> If it is open files, then there is a 'ulimit' setting you can
> raise: `ulimit -a` will show you what limits are in place now, and
> `ulimit -n` changes the "open files" limit. Try doubling whatever it
> is set to now.
>
> ==rob
More information about the parallel-netcdf
mailing list