With intel 2015.1.133 and gcc 5.1.0 in the path, error on make check with cxx interface

Nick Papior Andersen nickpapior at gmail.com
Mon Jun 1 19:14:26 CDT 2015


2015-06-02 1:48 GMT+02:00 Nick Papior Andersen <nickpapior at gmail.com>:

> I did this (adding traceback to the debugging compilation)
>
> ../configure CXXFLAGS='-g -O0 -traceback' CFLAGS='-g -O0 -traceback'
> CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpif90
> --prefix=/zdata/groups/common/nicpa/2015-test/XeonX5550/pnetcdf/1.6.0/intel-15.0.1
> --with-mpi=/zdata/groups/common/nicpa/2015-test/XeonX5550/openmpi/1.8.5/intel-15.0.1
> --enable-debug --disable-fortran
> make
> cd test/CXX
> make check
>
> And got (well the same thing :( ):
> ./nctst        ./testfile.nc
> ncmpi_inq_typeids not implemented
> [n-62-12-2:31968] *** Process received signal ***
> [n-62-12-2:31968] Signal: Segmentation fault (11)
> [n-62-12-2:31968] Signal code: Address not mapped (1)
> [n-62-12-2:31968] Failing at address: 0xffffffffffffffe8
> [n-62-12-2:31968] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2b13425f0710]
> [n-62-12-2:31968] [ 1]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2b1342157e79]
> [n-62-12-2:31968] [ 2]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2b1342158589]
> [n-62-12-2:31968] [ 3]
> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2b13421589e7]
> [n-62-12-2:31968] [ 4] ./nctst[0x40ad62]
> [n-62-12-2:31968] [ 5] ./nctst[0x40b93c]
> [n-62-12-2:31968] [ 6]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b1342a21d5d]
> [n-62-12-2:31968] [ 7] ./nctst[0x405899]
> [n-62-12-2:31968] *** End of error message ***
> make: *** [check] Segmentation fault
>
>
> Note that I have "corrected" the return to an empty string and the above
> compilation.
> However, it is clear from the code that the values that are to be updated
> (number of types) isn't set. Hence the error is in the logic behind calling
> the inq_typ, not the preceding code, which should function once the number
> of types code has been implemented, correctly setting the number of types.
>
I think I was a bit unclear here.
The integer size for the number of types hasn't been set properly, hence
the segmentation fault (I think).
The error is before this call, it shouldn't be made at all as you state.
Hence any debugging will probably not show anything useful as it is a path
that shouldn't be taken.
So the problem is why inq_typ is called. (I think we agree on this no?)

>
> (code snippet:
> int ntypesp;
>     ncmpiCheck(ncmpi_inq_typeids(getId(),
> &ntypesp,typeidsp),__FILE__,__LINE__);
> )
> with no initialization.
> Maybe the "not implemented" routine should set the ntypesp to 0?
>
>
> Note, that this may also be a compiler bug. :(
>
> 2015-06-02 1:24 GMT+02:00 Wei-keng Liao <wkliao at eecs.northwestern.edu>:
>
>> Hi, Nick
>>
>> I tried to see if any place that can indirectly invoke that message, but
>> could not find one.
>>
>> I wonder if you can kindly help me find more information about this
>> error, by rebuilding your PnetCDF with the following configure command
>> (with debug option enabled):
>>
>> ./configure --enable-debug --disable-fortran
>> --with-mpi=/zdata/groups/common/nicpa/2015-test/XeonX5550/openmpi/1.8.5/intel-15.0.1
>>
>> Disabling Fortran gives you a shorter build time.
>>
>> Once you build it, please cd directly to test/CXX and run "make check"
>> there.
>> This can skip all other tests.
>>
>> thanks
>>
>> Wei-keng
>>
>> On Jun 1, 2015, at 5:21 PM, Nick Papior Andersen wrote:
>>
>> > Oh, yeah sorry. it is the latest 1.6.0 version.
>> > Here is a tar with the config.log and the tmp.test (it was 1 mb, so
>> sorry for taring).
>> >
>> >
>> >
>> > 2015-06-02 0:16 GMT+02:00 Wei-keng Liao <wkliao at eecs.northwestern.edu>:
>> > That message "ncmpi_inq_typeids not implemented" should not appear.
>> > It is fishy.
>> >
>> > I forgot to ask the PnetCDF version you are using.
>> > Please let me know. Also, please send me the file config.log. Thanks.
>> >
>> > Wei-keng
>> >
>> > On Jun 1, 2015, at 5:02 PM, Nick Papior Andersen wrote:
>> >
>> > > Oh, and I do not have these problems using pure gcc 5.1.0 on my local
>> machine.
>> > >
>> > > 2015-06-02 0:00 GMT+02:00 Nick Papior Andersen <nickpapior at gmail.com
>> >:
>> > > Dear Wei-keng and Rob,
>> > >
>> > > My default options did not include -g flag, so the coredump was quite
>> un-useful ;(
>> > >
>> > > I did the catchsegv thing... Here is the output:
>> > >
>> > > $> catchsegv ./nctst ./testfile.nc
>> > > ncmpi_inq_typeids not implemented
>> > > *** Segmentation fault
>> > > Register dump:
>> > >
>> > >  RAX: 00002b61bec50000   RBX: 00002b61bec50000   RCX: 000000000000000c
>> > >  RDX: 0000000000000000   RSI: 00002b61bec50000   RDI: 00007fff5e677fa0
>> > >  RBP: 00007fff5e677fa0   R8 : 0000000000e79b60   R9 : 00000000000000f0
>> > >  R10: 00007fff5e677d70   R11: 00002b61be9e1560   R12: 00007fff5e6780e0
>> > >  R13: 00007fff5e678920   R14: 000000000000004c   R15: 00007fff5e677fa0
>> > >  RSP: 00007fff5e677f60
>> > >
>> > >  RIP: 00002b61be9e0e79   EFLAGS: 00010206
>> > >
>> > >  CS: 0033   FS: 0000   GS: 0000
>> > >
>> > >  Trap: 0000000e   Error: 00000005   OldMask: 00000000   CR2: ffffffe8
>> > >
>> > >  FPUCW: 0000037f   FPUSW: 00000000   TAG: 00002b61
>> > >  RIP: bf8f7fff   RDP: 5e677ff0
>> > >
>> > >  ST(0) 0000 0000000000000033   ST(1) 0000 000000000000000d
>> > >  ST(2) 0000 0000000000c80000   ST(3) 0000 0000000000000640
>> > >  ST(4) 0000 0000000000000000   ST(5) 0000 0000000000000000
>> > >  ST(6) 0000 0000000000000000   ST(7) 8000 8000000000000000
>> > >  mxcsr: 9fe0
>> > >  XMM0:  00000000000000000000000000000000 XMM1:
>> 00000000000000000000000000000000
>> > >  XMM2:  00000000000000000000000000000000 XMM3:
>> 00000000000000000000000000000000
>> > >  XMM4:  00000000000000000000000000000000 XMM5:
>> 00000000000000000000000000000000
>> > >  XMM6:  00000000000000000000000000000000 XMM7:
>> 00000000000000000000000000000000
>> > >  XMM8:  00000000000000000000000000000000 XMM9:
>> 00000000000000000000000000000000
>> > >  XMM10: 00000000000000000000000000000000 XMM11:
>> 00000000000000000000000000000000
>> > >  XMM12: 00000000000000000000000000000000 XMM13:
>> 00000000000000000000000000000000
>> > >  XMM14: 00000000000000000000000000000000 XMM15:
>> 00000000000000000000000000000000
>> > >
>> > > Backtrace:
>> > >
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2b61be9e0e79]
>> > >
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2b61be9e1589]
>> > >
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2b61be9e19e7]
>> > >
>> ??:0(_Z3genRKP19ompi_communicator_tPKcN7PnetCDF9NcmpiFile10FileFormatE)[0x40d4a5]
>> > > ??:0(main)[0x409f60]
>> > > /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b61bf0a5d5d]
>> > > ??:0(_start)[0x409cb9]
>> > >
>> > >
>> > > Where I think the top line is pretty self-explanatory ;)
>> > > And looking at the code, it makes sense, whether it should be called
>> at all is another matter...
>> > >
>> > > 2015-06-01 18:25 GMT+02:00 Wei-keng Liao <
>> wkliao at eecs.northwestern.edu>:
>> > > Hi, Nick,
>> > >
>> > > To print the trace of a segmentation fault is easy. You can run
>> command
>> > > "gdb corefile" and when at the gdb prompt, type command "where".
>> > > If you can send me the printout, it will be helpful.
>> > >
>> > > Just to clarify, if gcc 5.1.0 is used, are you saying there is no
>> problem of building PnetCDF?
>> > > Can you tell me the configure command line you used?
>> > >
>> > > Wei-keng
>> > >
>> > > On Jun 1, 2015, at 11:11 AM, Nick Papior Andersen wrote:
>> > >
>> > > >
>> > > >
>> > > > 2015-06-01 17:09 GMT+02:00 Wei-keng Liao <
>> wkliao at eecs.northwestern.edu>:
>> > > > Hi, Nick
>> > > >
>> > > > Your fix for the first bug makes all sense. I will add that to
>> PnetCDF. Thanks.
>> > > >
>> > > > As for the second error, can you use gdb to print the location of
>> the segmentation fault?
>> > > > I haven't done that. I would rather not go that path? I ain't an
>> avid user of gdb (yet).
>> > > > Also, do both errors happen to Intel C compiler?
>> > > > It is only the intel cxx compiler. I only show the gcc version as
>> intel uses that for compatibility issues. (see -gcc-name)
>> > > > I thought that this could be problem... Maybe it isn't.
>> > > >
>> > > > The two C compilers you used are the latest ones. I have not tried
>> them.
>> > > > Could you compile/run a simple C++ program to see if gcc works in
>> your environment?
>> > > > Or, if you installed gcc from source, have you tried run "make -k
>> check"?
>> > > > See https://gcc.gnu.org/install/test.html
>> > > >
>> > > > Gcc/g++ runs fine. I can compile 20 other different libraries with
>> full support. If anything, it is related to the intel compiler. :(
>> > > > If you do not need C++ component, you can build a PnetCDF without
>> it, by adding option
>> > > > "--disable-cxx" to the configure command.
>> > > > This is already my diverting methodology :) Thanks.
>> > > >
>> > > > By the way, I plan to release 1.6.1 today, but it will not fix the
>> second error you are seeing.
>> > > > I will try if I can find those new versions of C compiler and fix
>> the problem.
>> > > > The fix will have to wait for the next release, though.
>> > > > Ok. :)
>> > > >
>> > > > Thanks again for reporting the problem.
>> > > > You are welcome.
>> > > > Thanks for the software.
>> > > >
>> > > > Wei-keng
>> > > >
>> > > > On Jun 1, 2015, at 1:00 AM, Nick Papior Andersen wrote:
>> > > >
>> > > > > I am trying to compile and make check with these compilers:
>> > > > > intel 2015.1.13
>> > > > > and
>> > > > > gcc 5.1.0 in the path.
>> > > > >
>> > > > > Compiling goes fine and everything seems to link correctly.
>> > > > > However make check errors out in the CXX test.
>> > > > >
>> > > > > First I get this error message:
>> > > > > ./nctst        ./testfile.nc
>> > > > > terminate called after throwing an instance of 'std::logic_error'
>> > > > >   what():  basic_string::_M_construct null not valid
>> > > > > [n-62-12-2:09803] *** Process received signal ***
>> > > > > [n-62-12-2:09803] Signal: Aborted (6)
>> > > > > [n-62-12-2:09803] Signal code:  (-6)
>> > > > > [n-62-12-2:09803] [ 0]
>> /lib64/libpthread.so.0(+0xf710)[0x2aae4ac54710]
>> > > > > [n-62-12-2:09803] [ 1]
>> /lib64/libc.so.6(gsignal+0x35)[0x2aae4ae94625]
>> > > > > [n-62-12-2:09803] [ 2]
>> /lib64/libc.so.6(abort+0x175)[0x2aae4ae95e05]
>> > > > > [n-62-12-2:09803] [ 3]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x2aae4a7428cd]
>> > > > > [n-62-12-2:09803] [ 4]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(+0x8c936)[0x2aae4a740936]
>> > > > > [n-62-12-2:09803] [ 5]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(+0x8c981)[0x2aae4a740981]
>> > > > > [n-62-12-2:09803] [ 6]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(+0x8cb98)[0x2aae4a740b98]
>> > > > > [n-62-12-2:09803] [ 7]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt19__throw_logic_errorPKc+0x3f)[0x2aae4a767faf]
>> > > > > [n-62-12-2:09803] [ 8] ./nctst[0x461b42]
>> > > > > [n-62-12-2:09803] [ 9] ./nctst[0x4644b6]
>> > > > > [n-62-12-2:09803] [10] ./nctst[0x40c313]
>> > > > > [n-62-12-2:09803] [11] ./nctst[0x409f60]
>> > > > > [n-62-12-2:09803] [12]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aae4ae80d5d]
>> > > > > [n-62-12-2:09803] [13] ./nctst[0x409cb9]
>> > > > > [n-62-12-2:09803] *** End of error message ***
>> > > > >
>> > > > >
>> > > > > Secondly I change in file src/libcxx/ncmpiType.cpp:
>> > > > > function inq_type has 'return NULL' which cannot be done using
>> returns of string (unless it is a pointer, which it isn't)
>> > > > > So I change it to an empty string:
>> > > > > 'return ""'
>> > > > > (I am not sure when this is reached, but the error message
>> changes as can be seen below, hence my suspicion is at that code segment)
>> > > > >
>> > > > > Now I recompile and get this alternate error message:
>> > > > > ./nctst        ./testfile.nc
>> > > > > [n-62-12-2:23419] *** Process received signal ***
>> > > > > [n-62-12-2:23419] Signal: Segmentation fault (11)
>> > > > > [n-62-12-2:23419] Signal code: Address not mapped (1)
>> > > > > [n-62-12-2:23419] Failing at address: 0xffffffffffffffe8
>> > > > > [n-62-12-2:23419] [ 0]
>> /lib64/libpthread.so.0(+0xf710)[0x2ab6b02fa710]
>> > > > > [n-62-12-2:23419] [ 1]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZNSo6sentryC2ERSo+0x19)[0x2ab6afe61e79]
>> > > > > [n-62-12-2:23419] [ 2]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29)[0x2ab6afe62589]
>> > > > > [n-62-12-2:23419] [ 3]
>> /zdata/groups/common/nicpa/2015-test/generic/gcc/5.1.0/lib64/libstdc++.so.6(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc+0x27)[0x2ab6afe629e7]
>> > > > > [n-62-12-2:23419] [ 4] ./nctst[0x40d4a5]
>> > > > > [n-62-12-2:23419] [ 5] ./nctst[0x409f60]
>> > > > > [n-62-12-2:23419] [ 6]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ab6b0526d5d]
>> > > > > [n-62-12-2:23419] [ 7] ./nctst[0x409cb9]
>> > > > > [n-62-12-2:23419] *** End of error message ***
>> > > > > make[2]: *** [testing] Segmentation fault
>> > > > >
>> > > > > There seem to be something fishy with the cxx interface?
>> > > > > I am no expert in cxx... :( So had troubles debugging further...
>> > > > >
>> > > > > --
>> > > > > Kind regards Nick
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Kind regards Nick
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Kind regards Nick
>> > >
>> > >
>> > >
>> > > --
>> > > Kind regards Nick
>> >
>> >
>> >
>> >
>> > --
>> > Kind regards Nick
>> > <log-test.tar.gz>
>>
>>
>
>
> --
> Kind regards Nick
>



-- 
Kind regards Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20150602/e0ace4d3/attachment.html>


More information about the parallel-netcdf mailing list