error in enddef

Jim Edwards jedwards at ucar.edu
Tue Jun 28 09:41:04 CDT 2022


Hi Wei-Keng,

I found the issue with help from TACC user support:
https://www.intel.com/content/www/us/en/developer/articles/technical/large-mpi-tags-with-the-intel-mpi.html

I set Setting Environment MPIR_CVAR_CH4_OFI_RANK_BITS=15
   Setting Environment MPIR_CVAR_CH4_OFI_TAG_BITS=24
and added a print statement:
cam_restart.F90     123 Maximum tag value queried   8388607
this appears to be working.


On Tue, Jun 21, 2022 at 7:25 PM Wei-Keng Liao <wkliao at northwestern.edu>
wrote:

> Hi, Jim
>
> Is the ncmpi_enddef the first enddef call after the file creation,
> or after a ncmpi_redef?
>
> In the former case, there is no MPI communication in PnetCDF, except
> for an MPI_Barrier. In the latter case, if the file header size expands,
> existing variables need to be moved to higher offsets, which require
> PnetCDF to call MPI collective reads and writes and thus leads to
> MPI_Issend.
>
> Can you try to get a coredump so to trace the call stacks?
>
> You can also enable PnetCDF safe mode which will make additional MPI
> communication calls for debugging purpose. Sometimes it helps narrow
> down the problem cause. It can be enabled by setting environment
> variable PNETCDF_SAFE_MODE to 1.
>
> Wei-keng
>
> On Jun 21, 2022, at 5:03 PM, Jim Edwards <jedwards at ucar.edu> wrote:
>
> I am using pnetcdf 1.12.3 and getting an error when compiled with
> intel/19.1.1 and impi/19.0.9 on the TACC Frontera system
> I am getting very little information to guide me in debugging the error.
>
> [785] Abort(634628) on node 785 (rank 785 in comm 0): Fatal error in
> PMPI_Issend: Invalid tag, error stack:
> [785] PMPI_Issend(156): MPI_Issend(buf=0x2b5c81edf40f, count=1025120,
> MPI_BYTE, dest=0, tag=1048814, comm=0xc40000d7, request=0x7f2002783540)
> failed
> [785] PMPI_Issend(95).: Invalid tag, value is 1048814
> TACC:  MPI job exited with code: 4
> TACC:  Shutdown complete. Exiting.
>
>
> I can tell that I am in a call to ncmpi_enddef but not getting anything
> beyond that - any ideas?
>
> --
> Jim Edwards
>
> CESM Software Engineer
> National Center for Atmospheric Research
> Boulder, CO
>
>
>

-- 
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20220628/fd1b1586/attachment.html>


More information about the parallel-netcdf mailing list