pnetcdf nfmpi_put_vara_real_all problem

Wong, David Wong.David-C at epa.gov
Thu Sep 19 08:55:05 CDT 2013


Hi Wei-keng,

     I used the same say as I compiled my other model. Here is the detail:

1052 edison01:junk > module list
Currently Loaded Modulefiles:
  1) modules/3.2.6.7                       12) dmapp/6.0.1-1.0500.7263.9.31.ari      23) cray-mpich/6.0.2
  2) nsg/1.2.0                             13) gni-headers/3.0-1.0500.7161.11.4.ari  24) torque/4.2.3.h5
  3) eswrap/1.0.19-1.010001.264.0          14) xpmem/0.1-2.0500.41356.1.11.ari       25) moab/7.2.3-r19-b121-SUSE11
  4) switch/1.0-1.0500.41328.1.120.ari     15) job/1.5.5-0.1_2.0500.41368.1.92.ari   26) altd/1.0
  5) craype-network-aries                  16) csa/3.0.0-1_2.0500.41366.1.129.ari    27) darshan/2.2.7-106
  6) craype/1.06                           17) dvs/2.3_0.9.0-1.0500.1522.1.180       28) usg-default-modules/1.0
  7) intel/13.1.3.192                      18) rca/1.0.0-2.0500.41336.1.120.ari      29) cray-hdf5/1.8.9
  8) cray-libsci/12.1.01                   19) atp/1.6.3                             30) cray-netcdf/4.2.1.1
  9) udreg/2.3.2-1.0500.6756.2.10.ari      20) PrgEnv-intel/5.0.41                   31) parallel-netcdf/1.3.1
 10) ugni/5.0-1.0500.0.3.306.ari           21) craype-ivybridge
 11) pmi/4.0.1-1.0000.9725.84.2.ari        22) cray-shmem/6.0.2
1053 edison01:junk > module display parallel-netcdf/1.3.1
-------------------------------------------------------------------
/opt/cray/modulefiles/parallel-netcdf/1.3.1:

setenv		 CRAY_PARALLEL_NETCDF_DIR /opt/cray/parallel-netcdf/1.3.1 
setenv		 CRAY_PARALLEL_NETCDF_VERSION 1.3.1 
prepend-path	 PE_PRODUCT_LIST CRAY_PARALLEL_NETCDF 
setenv		 CRAY_PARALLEL_NETCDF 74 
setenv		 INTEL_PARALLEL_NETCDF 120 
setenv		 PGI_PARALLEL_NETCDF 119 
prepend-path	 PATH /opt/cray/parallel-netcdf/1.3.1/bin 
setenv		 PARALLEL_NETCDF_DIR /opt/cray/parallel-netcdf/1.3.1/intel/120 
prepend-path	 CRAY_LD_LIBRARY_PATH /opt/cray/parallel-netcdf/1.3.1/intel/120/lib 
prepend-path	 MANPATH /opt/cray/parallel-netcdf/1.3.1/man 
-------------------------------------------------------------------

1054 edison01:junk > module display cray-mpich/6.0.2
-------------------------------------------------------------------
/opt/cray/modulefiles/cray-mpich/6.0.2:

conflict	 parallel-netcdf 
conflict	 cray-mpich 
conflict	 cray-mpich2 
conflict	 xt-mpich2 
conflict	 xt-mpt 
setenv		 CRAY_MPICH2_VER 6.0.2 
setenv		 CRAY_MPICH2_ROOTDIR /opt/cray/mpt/6.0.2 
setenv		 CRAY_MPICH2_BASEDIR /opt/cray/mpt/6.0.2/gni 
setenv		 CRAY_MPICH2_DIR /opt/cray/mpt/6.0.2/gni/mpich2-intel/130 
setenv		 MPICH_DIR /opt/cray/mpt/6.0.2/gni/mpich2-intel/130 
prepend-path	 PE_PRODUCT_LIST CRAY_MPICH2 
setenv		 CRAY_MPICH2 81 
setenv		 INTEL_MPICH2 130 
setenv		 PGI_MPICH2 121 
prepend-path	 CRAY_LD_LIBRARY_PATH /opt/cray/mpt/6.0.2/gni/mpich2-intel/130/lib 
prepend-path	 MANPATH /opt/cray/mpt/6.0.2/gni/man/mpich2 
module-whatis	 cray-mpich - Cray MPICH2 Message Passing Interface 
-------------------------------------------------------------------

1055 edison01:junk > make
ifort -c -C -traceback -fixed -132 -O3 -I /opt/cray/parallel-netcdf/1.3.1/intel/120/include -I /opt/cray/mpt/6.0.2/gni/mpich2-intel/130/include -I.   epa.F
ifort  epa.o  -L/opt/cray/netcdf/4.2.1.1/intel/120/lib -lnetcdf -lnetcdff -L/opt/cray/parallel-netcdf/1.3.1/intel/120/lib -lpnetcdf -o epa.x


Cheers,
David


________________________________________
From: Wei-keng Liao <wkliao at ece.northwestern.edu>
Sent: Thursday, September 19, 2013 9:34 AM
To: Wong, David
Cc: parallel-netcdf at mcs.anl.gov
Subject: Re: pnetcdf nfmpi_put_vara_real_all problem

David,

This is a different error from the previous one you reported.
This error most likely is caused by the compile and link with
wrong MPI or PnetCDF libraries, as nfmpi_create is the first
PnetCDF call in the test program.

I google the error message and it seems like you compiled with
a wrong MPI header file, maybe implying the PnetCDF library was
built with a different MPI compiler from the one you used to
compile this test program.

Could you show us how you compile the program?
Also, what version of PnetCDF is used to compile?


Wei-keng

On Sep 19, 2013, at 7:35 AM, Wong, David wrote:

> Hi Wei-keng,
>
>     It failed at the following step:
>
>         err = nfmpi_create(MPI_COMM_WORLD, filename, NF_CLOBBER,
>     +                       MPI_INFO_NULL, ncid)
>
> with the following error message (from one of the processor):
>
> Rank 1 [Thu Sep 19 04:27:50 2013] [c5-0c1s4n3] Fatal error in MPI_Comm_test_inter: Invalid communicator, error stack:
> MPI_Comm_test_inter(110): MPI_Comm_test_inter(comm=0x84000002, flag=0x7fffffff8f90) failed
> MPI_Comm_test_inter(83).: Invalid communicator
> Rank 2 [Thu Sep 19 04:27:50 2013] [c5-0c1s4n3] Fatal error in MPI_Comm_test_inter: Invalid communicator, error stack:
> MPI_Comm_test_inter(110): MPI_Comm_test_inter(comm=0x84000002, flag=0x7fffffff8f90) failed
> MPI_Comm_test_inter(83).: Invalid communicator
> Rank 3 [Thu Sep 19 04:27:50 2013] [c5-0c1s4n3] Fatal error in MPI_Comm_test_inter: Invalid communicator, error stack:
> MPI_Comm_test_inter(110): MPI_Comm_test_inter(comm=0x84000002, flag=0x7fffffff8f90) failed
> MPI_Comm_test_inter(83).: Invalid communicator
> Rank 0 [Thu Sep 19 04:27:50 2013] [c5-0c1s4n3] Fatal error in MPI_Comm_test_inter: Invalid communicator, error stack:
> MPI_Comm_test_inter(110): MPI_Comm_test_inter(comm=0x84000004, flag=0x7fffffff8f90) failed
> MPI_Comm_test_inter(83).: Invalid communicator
> forrtl: error (76): Abort trap signal
> Image              PC                Routine            Line        Source
> libc.so.6          00002AAAAB92CB35  Unknown               Unknown  Unknown
> libc.so.6          00002AAAAB92E111  Unknown               Unknown  Unknown
> epa.x              0000000000435E42  Unknown               Unknown  Unknown
> epa.x              000000000042B160  Unknown               Unknown  Unknown
> epa.x              000000000042B30D  Unknown               Unknown  Unknown
> libmpich_intel.so  00002AAAAC248B83  Unknown               Unknown  Unknown
> epa.x              0000000000523C3E  Unknown               Unknown  Unknown
> epa.x              000000000051E508  Unknown               Unknown  Unknown
> epa.x              000000000051DBC5  Unknown               Unknown  Unknown
> epa.x              00000000004245EF  MAIN__                     33  epa.F
> epa.x              000000000042454C  Unknown               Unknown  Unknown
> libc.so.6          00002AAAAB918C16  Unknown               Unknown  Unknown
> epa.x              000000000042444D  Unknown               Unknown  Unknown
>
>    Regarding to Rob's suggestion, I am using aprun or mpirun to launch the executable so I don't know how to invoke valgrind at this time.
>
> Cheers,
> David
>
> --
> David C. Wong Ph.D.
> Atmospheric Modeling and Analysis Division
> National Exposure Research Laboratory
> US Environmental Protection Agency
> Mail Drop E243-03
> 109 T. W. Alexander Dr.
> Research Triangle Park, NC 27711
> 919-541-3400 919-541-1379 (fax)
>
> ________________________________________
> From: Wei-keng Liao <wkliao at ece.northwestern.edu>
> Sent: Wednesday, September 18, 2013 7:13 PM
> To: Wong, David
> Cc: parallel-netcdf at mcs.anl.gov
> Subject: Re: pnetcdf nfmpi_put_vara_real_all problem
>
> Hi, David,
>
> Could you please try the attached program and let us know if it
> generates the same error? It is written based on the information
> you provided.
>



More information about the parallel-netcdf mailing list