With intel 2015.1.133 and gcc 5.1.0 in the path, error on make check with cxx interface

Nick Papior Andersen nickpapior at gmail.com
Tue Jun 2 02:58:08 CDT 2015


I did all the things, plus some more, I have attached the ncmpi_notyet,
recompiled, re-checked, and got the same thing.

./nctst        ./testfile.nc
ncmpi_inq_typeids not implemented
[n-62-12-2:21767] *** Process received signal ***
[n-62-12-2:21767] Signal: Segmentation fault (11)
[n-62-12-2:21767] Signal code: Address not mapped (1)
[n-62-12-2:21767] Failing at address: 0xffffffffffffffe8
[n-62-12-2:21767] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2b3e93945710]
[n-62-12-2:21767] [ 1]
/zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2b3e934ace79]
[n-62-12-2:21767] [ 2]
/zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2b3e934ad589]
[n-62-12-2:21767] [ 3]
/zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2b3e934ad9e7]
[n-62-12-2:21767] [ 4] ./nctst[0x40ad62]
[n-62-12-2:21767] [ 5] ./nctst[0x40b93c]
[n-62-12-2:21767] [ 6]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3e93d76d5d]
[n-62-12-2:21767] [ 7] ./nctst[0x405899]
[n-62-12-2:21767] *** End of error message ***
make: *** [check] Segmentation fault



2015-06-02 2:35 GMT+02:00 Wei-keng Liao <wkliao at eecs.northwestern.edu>:

> You are right. None of the locations that call ncmpi_inq_typeids() are
> supposed to reach at all,
> because that function is for compound data types (not supported by
> PnetCDF.)
> I added a printf statement to all places this function is called, but none
> shows.
>
> What you suggested may be a right solution, could you please give it a try?
> i.e. change line 112 of file ncmpi_notyet.cpp from
>
> ncmpi_inq_typeids(int ncid, int *ntypes, int *typeids){printf("%s not
> implemented\n",__func__); return NC_EINVAL;}
> to
> ncmpi_inq_typeids(int ncid, int *ntypes, int *typeids){*ntypes = 0;
> printf("%s not implemented\n",__func__); return NC_EINVAL;}
>
>
> Once you made the changes, you can run "make" command in directory
> src/libcxx
> and then go to test/CXX, run "make check".
>
>
> Wei-keng
>
> On Jun 1, 2015, at 7:14 PM, Nick Papior Andersen wrote:
>
> >
> >
> > 2015-06-02 1:48 GMT+02:00 Nick Papior Andersen <nickpapior at gmail.com>:
> > I did this (adding traceback to the debugging compilation)
> >
> > ../configure CXXFLAGS='-g -O0 -traceback' CFLAGS='-g -O0 -traceback'
> CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpif90
> --prefix=/zdata/groups/common/nicpa/2015-test/XeonX5550/pnetcdf/1.6.0/intel-15.0.1
> --with-mpi=/zdata/groups/common/nicpa/2015-test/XeonX5550/openmpi/1.8.5/intel-15.0.1
> --enable-debug --disable-fortran
> > make
> > cd test/CXX
> > make check
> >
> > And got (well the same thing :( ):
> > ./nctst        ./testfile.nc
> > ncmpi_inq_typeids not implemented
> > [n-62-12-2:31968] *** Process received signal ***
> > [n-62-12-2:31968] Signal: Segmentation fault (11)
> > [n-62-12-2:31968] Signal code: Address not mapped (1)
> > [n-62-12-2:31968] Failing at address: 0xffffffffffffffe8
> > [n-62-12-2:31968] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2b13425f0710]
> > [n-62-12-2:31968] [ 1]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2b1342157e79]
> > [n-62-12-2:31968] [ 2]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2b1342158589]
> > [n-62-12-2:31968] [ 3]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2b13421589e7]
> > [n-62-12-2:31968] [ 4] ./nctst[0x40ad62]
> > [n-62-12-2:31968] [ 5] ./nctst[0x40b93c]
> > [n-62-12-2:31968] [ 6]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b1342a21d5d]
> > [n-62-12-2:31968] [ 7] ./nctst[0x405899]
> > [n-62-12-2:31968] *** End of error message ***
> > make: *** [check] Segmentation fault
> >
> >
> > Note that I have "corrected" the return to an empty string and the above
> compilation.
> > However, it is clear from the code that the values that are to be
> updated (number of types) isn't set. Hence the error is in the logic behind
> calling the inq_typ, not the preceding code, which should function once the
> number of types code has been implemented, correctly setting the number of
> types.
> > I think I was a bit unclear here.
> > The integer size for the number of types hasn't been set properly, hence
> the segmentation fault (I think).
> > The error is before this call, it shouldn't be made at all as you state.
> Hence any debugging will probably not show anything useful as it is a path
> that shouldn't be taken.
> > So the problem is why inq_typ is called. (I think we agree on this no?)
> >
> > (code snippet:
> > int ntypesp;
> >     ncmpiCheck(ncmpi_inq_typeids(getId(),
> &ntypesp,typeidsp),__FILE__,__LINE__);
> > )
> > with no initialization.
> > Maybe the "not implemented" routine should set the ntypesp to 0?
> >
> >
> > Note, that this may also be a compiler bug. :(
> >
> > 2015-06-02 1:24 GMT+02:00 Wei-keng Liao <wkliao at eecs.northwestern.edu>:
> > Hi, Nick
> >
> > I tried to see if any place that can indirectly invoke that message, but
> could not find one.
> >
> > I wonder if you can kindly help me find more information about this
> error, by rebuilding your PnetCDF with the following configure command
> (with debug option enabled):
> >
> > ./configure --enable-debug --disable-fortran
> --with-mpi=/zdata/groups/common/nicpa/2015-test/XeonX5550/openmpi/1.8.5/intel-15.0.1
> >
> > Disabling Fortran gives you a shorter build time.
> >
> > Once you build it, please cd directly to test/CXX and run "make check"
> there.
> > This can skip all other tests.
> >
> > thanks
> >
> > Wei-keng
> >
> > On Jun 1, 2015, at 5:21 PM, Nick Papior Andersen wrote:
> >
> > > Oh, yeah sorry. it is the latest 1.6.0 version.
> > > Here is a tar with the config.log and the tmp.test (it was 1 mb, so
> sorry for taring).
> > >
> > >
> > >
> > > 2015-06-02 0:16 GMT+02:00 Wei-keng Liao <wkliao at eecs.northwestern.edu
> >:
> > > That message "ncmpi_inq_typeids not implemented" should not appear.
> > > It is fishy.
> > >
> > > I forgot to ask the PnetCDF version you are using.
> > > Please let me know. Also, please send me the file config.log. Thanks.
> > >
> > > Wei-keng
> > >
> > > On Jun 1, 2015, at 5:02 PM, Nick Papior Andersen wrote:
> > >
> > > > Oh, and I do not have these problems using pure gcc 5.1.0 on my
> local machine.
> > > >
> > > > 2015-06-02 0:00 GMT+02:00 Nick Papior Andersen <nickpapior at gmail.com
> >:
> > > > Dear Wei-keng and Rob,
> > > >
> > > > My default options did not include -g flag, so the coredump was
> quite un-useful ;(
> > > >
> > > > I did the catchsegv thing... Here is the output:
> > > >
> > > > $> catchsegv ./nctst ./testfile.nc
> > > > ncmpi_inq_typeids not implemented
> > > > *** Segmentation fault
> > > > Register dump:
> > > >
> > > >  RAX: 00002b61bec50000   RBX: 00002b61bec50000   RCX:
> 000000000000000c
> > > >  RDX: 0000000000000000   RSI: 00002b61bec50000   RDI:
> 00007fff5e677fa0
> > > >  RBP: 00007fff5e677fa0   R8 : 0000000000e79b60   R9 :
> 00000000000000f0
> > > >  R10: 00007fff5e677d70   R11: 00002b61be9e1560   R12:
> 00007fff5e6780e0
> > > >  R13: 00007fff5e678920   R14: 000000000000004c   R15:
> 00007fff5e677fa0
> > > >  RSP: 00007fff5e677f60
> > > >
> > > >  RIP: 00002b61be9e0e79   EFLAGS: 00010206
> > > >
> > > >  CS: 0033   FS: 0000   GS: 0000
> > > >
> > > >  Trap: 0000000e   Error: 00000005   OldMask: 00000000   CR2: ffffffe8
> > > >
> > > >  FPUCW: 0000037f   FPUSW: 00000000   TAG: 00002b61
> > > >  RIP: bf8f7fff   RDP: 5e677ff0
> > > >
> > > >  ST(0) 0000 0000000000000033   ST(1) 0000 000000000000000d
> > > >  ST(2) 0000 0000000000c80000   ST(3) 0000 0000000000000640
> > > >  ST(4) 0000 0000000000000000   ST(5) 0000 0000000000000000
> > > >  ST(6) 0000 0000000000000000   ST(7) 8000 8000000000000000
> > > >  mxcsr: 9fe0
> > > >  XMM0:  00000000000000000000000000000000 XMM1:
> 00000000000000000000000000000000
> > > >  XMM2:  00000000000000000000000000000000 XMM3:
> 00000000000000000000000000000000
> > > >  XMM4:  00000000000000000000000000000000 XMM5:
> 00000000000000000000000000000000
> > > >  XMM6:  00000000000000000000000000000000 XMM7:
> 00000000000000000000000000000000
> > > >  XMM8:  00000000000000000000000000000000 XMM9:
> 00000000000000000000000000000000
> > > >  XMM10: 00000000000000000000000000000000 XMM11:
> 00000000000000000000000000000000
> > > >  XMM12: 00000000000000000000000000000000 XMM13:
> 00000000000000000000000000000000
> > > >  XMM14: 00000000000000000000000000000000 XMM15:
> 00000000000000000000000000000000
> > > >
> > > > Backtrace:
> > > >
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2b61be9e0e79]
> > > >
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2b61be9e1589]
> > > >
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2b61be9e19e7]
> > > >
> ??:0(_Z3genRKP19ompi_communicator_tPKcN7PnetCDF9NcmpiFile10FileFormatE)[0x40d4a5]
> > > > ??:0(main)[0x409f60]
> > > > /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b61bf0a5d5d]
> > > > ??:0(_start)[0x409cb9]
> > > >
> > > >
> > > > Where I think the top line is pretty self-explanatory ;)
> > > > And looking at the code, it makes sense, whether it should be called
> at all is another matter...
> > > >
> > > > 2015-06-01 18:25 GMT+02:00 Wei-keng Liao <
> wkliao at eecs.northwestern.edu>:
> > > > Hi, Nick,
> > > >
> > > > To print the trace of a segmentation fault is easy. You can run
> command
> > > > "gdb corefile" and when at the gdb prompt, type command "where".
> > > > If you can send me the printout, it will be helpful.
> > > >
> > > > Just to clarify, if gcc 5.1.0 is used, are you saying there is no
> problem of building PnetCDF?
> > > > Can you tell me the configure command line you used?
> > > >
> > > > Wei-keng
> > > >
> > > > On Jun 1, 2015, at 11:11 AM, Nick Papior Andersen wrote:
> > > >
> > > > >
> > > > >
> > > > > 2015-06-01 17:09 GMT+02:00 Wei-keng Liao <
> wkliao at eecs.northwestern.edu>:
> > > > > Hi, Nick
> > > > >
> > > > > Your fix for the first bug makes all sense. I will add that to
> PnetCDF. Thanks.
> > > > >
> > > > > As for the second error, can you use gdb to print the location of
> the segmentation fault?
> > > > > I haven't done that. I would rather not go that path? I ain't an
> avid user of gdb (yet).
> > > > > Also, do both errors happen to Intel C compiler?
> > > > > It is only the intel cxx compiler. I only show the gcc version as
> intel uses that for compatibility issues. (see -gcc-name)
> > > > > I thought that this could be problem... Maybe it isn't.
> > > > >
> > > > > The two C compilers you used are the latest ones. I have not tried
> them.
> > > > > Could you compile/run a simple C++ program to see if gcc works in
> your environment?
> > > > > Or, if you installed gcc from source, have you tried run "make -k
> check"?
> > > > > See https://gcc.gnu.org/install/test.html
> > > > >
> > > > > Gcc/g++ runs fine. I can compile 20 other different libraries with
> full support. If anything, it is related to the intel compiler. :(
> > > > > If you do not need C++ component, you can build a PnetCDF without
> it, by adding option
> > > > > "--disable-cxx" to the configure command.
> > > > > This is already my diverting methodology :) Thanks.
> > > > >
> > > > > By the way, I plan to release 1.6.1 today, but it will not fix the
> second error you are seeing.
> > > > > I will try if I can find those new versions of C compiler and fix
> the problem.
> > > > > The fix will have to wait for the next release, though.
> > > > > Ok. :)
> > > > >
> > > > > Thanks again for reporting the problem.
> > > > > You are welcome.
> > > > > Thanks for the software.
> > > > >
> > > > > Wei-keng
> > > > >
> > > > > On Jun 1, 2015, at 1:00 AM, Nick Papior Andersen wrote:
> > > > >
> > > > > > I am trying to compile and make check with these compilers:
> > > > > > intel 2015.1.13
> > > > > > and
> > > > > > gcc 5.1.0 in the path.
> > > > > >
> > > > > > Compiling goes fine and everything seems to link correctly.
> > > > > > However make check errors out in the CXX test.
> > > > > >
> > > > > > First I get this error message:
> > > > > > ./nctst        ./testfile.nc
> > > > > > terminate called after throwing an instance of 'std::logic_error'
> > > > > >   what():  basic_string::_M_construct null not valid
> > > > > > [n-62-12-2:09803] *** Process received signal ***
> > > > > > [n-62-12-2:09803] Signal: Aborted (6)
> > > > > > [n-62-12-2:09803] Signal code:  (-6)
> > > > > > [n-62-12-2:09803] [ 0]
> /lib64/libpthread.so.0(+0xf710)[0x2aae4ac54710]
> > > > > > [n-62-12-2:09803] [ 1]
> /lib64/libc.so.6(gsignal+0x35)[0x2aae4ae94625]
> > > > > > [n-62-12-2:09803] [ 2]
> /lib64/libc.so.6(abort+0x175)[0x2aae4ae95e05]
> > > > > > [n-62-12-2:09803] [ 3]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x2aae4a7428cd]
> > > > > > [n-62-12-2:09803] [ 4]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(+0x8c936)[0x2aae4a740936]
> > > > > > [n-62-12-2:09803] [ 5]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(+0x8c981)[0x2aae4a740981]
> > > > > > [n-62-12-2:09803] [ 6]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(+0x8cb98)[0x2aae4a740b98]
> > > > > > [n-62-12-2:09803] [ 7]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt19__throw_logic_errorPKc+0x3f)[0x2aae4a767faf]
> > > > > > [n-62-12-2:09803] [ 8] ./nctst[0x461b42]
> > > > > > [n-62-12-2:09803] [ 9] ./nctst[0x4644b6]
> > > > > > [n-62-12-2:09803] [10] ./nctst[0x40c313]
> > > > > > [n-62-12-2:09803] [11] ./nctst[0x409f60]
> > > > > > [n-62-12-2:09803] [12]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aae4ae80d5d]
> > > > > > [n-62-12-2:09803] [13] ./nctst[0x409cb9]
> > > > > > [n-62-12-2:09803] *** End of error message ***
> > > > > >
> > > > > >
> > > > > > Secondly I change in file src/libcxx/ncmpiType.cpp:
> > > > > > function inq_type has 'return NULL' which cannot be done using
> returns of string (unless it is a pointer, which it isn't)
> > > > > > So I change it to an empty string:
> > > > > > 'return ""'
> > > > > > (I am not sure when this is reached, but the error message
> changes as can be seen below, hence my suspicion is at that code segment)
> > > > > >
> > > > > > Now I recompile and get this alternate error message:
> > > > > > ./nctst        ./testfile.nc
> > > > > > [n-62-12-2:23419] *** Process received signal ***
> > > > > > [n-62-12-2:23419] Signal: Segmentation fault (11)
> > > > > > [n-62-12-2:23419] Signal code: Address not mapped (1)
> > > > > > [n-62-12-2:23419] Failing at address: 0xffffffffffffffe8
> > > > > > [n-62-12-2:23419] [ 0]
> /lib64/libpthread.so.0(+0xf710)[0x2ab6b02fa710]
> > > > > > [n-62-12-2:23419] [ 1]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2ab6afe61e79]
> > > > > > [n-62-12-2:23419] [ 2]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2ab6afe62589]
> > > > > > [n-62-12-2:23419] [ 3]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2ab6afe629e7]
> > > > > > [n-62-12-2:23419] [ 4] ./nctst[0x40d4a5]
> > > > > > [n-62-12-2:23419] [ 5] ./nctst[0x409f60]
> > > > > > [n-62-12-2:23419] [ 6]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ab6b0526d5d]
> > > > > > [n-62-12-2:23419] [ 7] ./nctst[0x409cb9]
> > > > > > [n-62-12-2:23419] *** End of error message ***
> > > > > > make[2]: *** [testing] Segmentation fault
> > > > > >
> > > > > > There seem to be something fishy with the cxx interface?
> > > > > > I am no expert in cxx... :( So had troubles debugging further...
> > > > > >
> > > > > > --
> > > > > > Kind regards Nick
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Kind regards Nick
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Kind regards Nick
> > > >
> > > >
> > > >
> > > > --
> > > > Kind regards Nick
> > >
> > >
> > >
> > >
> > > --
> > > Kind regards Nick
> > > <log-test.tar.gz>
> >
> >
> >
> >
> > --
> > Kind regards Nick
> >
> >
> >
> > --
> > Kind regards Nick
>
>


-- 
Kind regards Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150602/bc59be8d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncmpi_notyet.cpp
Type: text/x-c++src
Size: 16041 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150602/bc59be8d/attachment-0001.cpp>


More information about the parallel-netcdf mailing list