[MOAB-dev] Simple code to reproduce ICC segmentation fault

Iulian Grindeanu iulian at mcs.anl.gov
Tue Sep 10 14:20:53 CDT 2013


running the test with experimental thing --tool=exp-sgcheck found it: 
(I did not have to compile with -C) 

I didn't know about this experimental option 

Thanks, 
Iulian 

iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck scdseq_test 
==6670== exp-sgcheck, a stack and global array overrun detector 
==6670== NOTE: This is an Experimental-Class Valgrind Tool 
==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd et al. 
==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info 
==6670== Command: scdseq_test 
==6670== 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
Running test_parallel_partitions ... 
==6670== Invalid write of size 4 
==6670== at 0x418C09: moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:787) 
==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:1154) 
==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1216) 
==6670== by 0x417726: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1379) 
==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff878 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
==6670== Invalid write of size 4 
==6670== at 0x418C09: moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:787) 
==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:1154) 
==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1216) 
==6670== by 0x4177E8: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1392) 
==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff878 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
==6670== Invalid write of size 4 
==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868) 
==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:758) 
==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1219) 
==6670== by 0x417726: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1379) 
==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff888 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
==6670== Invalid write of size 4 
==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868) 
==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:758) 
==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1219) 
==6670== by 0x4177E8: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1392) 
==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff888 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
Running test_vertex_seq ... 
Running test_element_seq ... 
Running test_periodic_seq ... 

----- Original Message -----

| Hmm, I guess I'd still vote for using std::vector for
| statically-allocated arrays, rather than the alternative of
| building -C into all our debug builds. Thoughts?

| - tim

| On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote:
| > You need to compile with -C to catch static allocation errors. That
| > will specifically turn on range checks.
| >
| > Good to know that memcheck doesn't give you invalid read errors on
| > statically allocated arrays. Look at faq:
| > http://valgrind.org/docs/manual/faq.html
| >
| >> Why doesn't Memcheck find the array overruns in this program?
| >> Unfortunately, Memcheck doesn't do bounds checking on global or
| >> stack arrays. We'd like to, but it's just not possible to do in a
| >> reasonable way that fits with how Memcheck works. Sorry.
| >
| >> However, the experimental tool SGcheck can detect errors like
| >> this. Run Valgrind with the --tool=exp-sgcheck option to try it,
| >> but be aware that it is not as robust as Memcheck.
| >
| > Vijay
| >
| > On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov>
| > wrote:
| >> Good catch Danqing, I didn't know that (that valgrind wouldn't
| >> catch out of
| >> bounds errors on statically-allocated arrays).
| >>
| >> The preferred way to do this, then, will be to use std::vector,
| >> with a
| >> static size set at instantiation. That makes it dynamically
| >> allocated but
| >> still static size. I'll remember that one.
| >>
| >> - tim
| >>
| >> On 09/10/2013 01:22 PM, Danqing Wu wrote:
| >>>
| >>> Here is what I found online:
| >>>
| >>> What Won't Valgrind Find?
| >>> Valgrind doesn't perform bounds checking on static arrays
| >>> (allocated on
| >>> the stack). So if you declare an array inside your function:
| >>>
| >>> int main()
| >>> {
| >>> char x[10];
| >>> x[11] = 'a';
| >>> }
| >>>
| >>> then Valgrind won't alert you! One possible solution for testing
| >>> purposes
| >>> is simply to change your static arrays into dynamically allocated
| >>> memory
| >>> taken from the heap, where you will get bounds-checking, though
| >>> this could
| >>> be a mess of unfreed memory.
| >>>
| >>> ----- Original Message -----
| >>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
| >>> To: "Danqing Wu" <wuda at mcs.anl.gov>
| >>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
| >>> Sent: Tuesday, September 10, 2013 1:02:45 PM
| >>> Subject: Re: Simple code to reproduce ICC segmentation fault
| >>>
| >>>
| >>>
| >>>
| >>> ----- Original Message -----
| >>>
| >>>
| >>>
| >>> After correcting that, moab-intel test works fine!
| >>> Good job again, Danqing!
| >>>
| >>> Thanks,
| >>> Iulian
| >>>
| >>> now the question is why valgrind did not find this ...
| >>>
| >>>
| >>>
| >>>
| >>> ----- Original Message -----
| >>>
| >>>
| >>> I think I found one possible reason.
| >>>
| >>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom,
| >>> const int * const gdims, const int * const gperiodic, const int *
| >>> const
| >>> dijk,
| >>> int &pto, int *rdims, int *facedims, int *across_bdy)
| >>> {
| >>> ...
| >>> int ldims[6], pijk[3], lperiodic[2];
| >>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims,
| >>> gperiodic,
| >>> ldims, lperiodic, pijk);
| >>> ...
| >>> }
| >>>
| >>> Here lperiodic[2] should be lperiodic[3], as the third element
| >>> will be
| >>> accessed inside compute_partition_alljkbal().
| >>>
| >>> The behaviour could be dependent on compilers. Maybe only for ICC
| >>> 12 and
| >>> O2, and when assert is disabled, this out of memory issue causes
| >>> a
| >>> segmentation fault.
| >>>
| >>> I will retest after this fix.
| >>>
| >>> ----- Original Message -----
| >>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
| >>> To: "Danqing Wu" <wuda at mcs.anl.gov>
| >>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
| >>> Sent: Tuesday, September 10, 2013 10:17:28 AM
| >>> Subject: Re: Simple code to reproduce ICC segmentation fault
| >>>
| >>>
| >>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel
| >>> build to
| >>> jenkins; we may have to rebuild netcdf with icc if there are
| >>> issues with
| >>> libcurl.
| >>>
| >>> Any suggestions?
| >>>
| >>> Iulian
| >>> ----- Original Message -----
| >>>
| >>>
| >>> On gnep, icc 12.
| >>>
| >>> Configure option
| >>> ./configure --prefix=/homes/fathom/libs/current/moabintel
| >>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel
| >>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel
| >>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc
| >>> --enable-igeom
| >>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort
| >>>
| >>> So the flags will include both -O2 and -DNDEBUG
| >>>
| >>> Here since NDEBUG is enabled, all of the assert(...) will do
| >>> nothing, and
| >>> this could make some differences.
| >>>
| >>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test
| >>> can pass. I
| >>> guess ICC 12 would be affected by the assert stuff.
| >>>
| >>> ----- Original Message -----
| >>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
| >>> To: "Danqing Wu" <wuda at mcs.anl.gov>
| >>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
| >>> Sent: Tuesday, September 10, 2013 10:04:39 AM
| >>> Subject: Re: Simple code to reproduce ICC segmentation fault
| >>>
| >>>
| >>> so this is with icc -O2 or what are the compile options?
| >>> Is this on gnep? icc 12? icc 13?
| >>>
| >>> Should we try to use ubuntu 12 for intel builds?
| >>>
| >>> (we can do that on jenkins auto build platform)
| >>>
| >>> Iulian
| >>>
| >>>
| >>> ----- Original Message -----
| >>>
| >>>
| >>> I am still debugging, but it seems that the two calls of
| >>> ScdInterface::get_neighbor() caused the crash. If I comment out
| >>> the second
| >>> call, no segmentaion fault.
| >>>
| >>>
| >>> #include "moab/ScdInterface.hpp"
| >>> #include "moab/Core.hpp"
| >>>
| >>> #include <iostream>
| >>>
| >>> using namespace moab;
| >>>
| >>> int main()
| >>> {
| >>> Core moab;
| >>> ScdInterface* scdi;
| >>> ErrorCode rval = moab.Interface::query_interface(scdi);
| >>>
| >>> int gdims[] = {0, 0, 0, 48, 40, 18};
| >>> int nprocs = 4;
| >>> int pto = 0;
| >>> int across_bdy_a[3] = {0};
| >>> int rdims_a[6] = {0};
| >>> int facedims_a[6] = {0};
| >>>
| >>> ScdParData spd;
| >>> int n;
| >>> for (n = 0; n < 6; n++)
| >>> spd.gDims[n] = gdims[n];
| >>> for (n = 0; n < 3; n++)
| >>> spd.gPeriodic[n] = 0;
| >>>
| >>> spd.partMethod = ScdParData::ALLJKBAL;
| >>>
| >>> int dijka[3] = {0};
| >>>
| >>> dijka[0] = -1;
| >>> dijka[1] = -1;
| >>> dijka[2] = -1;
| >>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto,
| >>> rdims_a,
| >>> facedims_a, across_bdy_a);
| >>>
| >>> dijka[0] = 0;
| >>> dijka[1] = -1;
| >>> dijka[2] = -1;
| >>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto,
| >>> rdims_a,
| >>> facedims_a, across_bdy_a);
| >>>
| >>> std::cout << "Return from main()" << std::endl;
| >>>
| >>> return 0;
| >>> }
| >>>
| >>>
| >>>
| >>>
| >>>
| >>
| >> --
| >> ================================================================
| >> "You will keep in perfect peace him whose mind is
| >> steadfast, because he trusts in you." Isaiah 26:3
| >>
| >> Tim Tautges Argonne National Laboratory
| >> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
| >> phone (gvoice): (608) 354-1459 1500 Engineering Dr.
| >> fax: (608) 263-4499 Madison, WI 53706
| >>
| >

| --
| ================================================================
| "You will keep in perfect peace him whose mind is
| steadfast, because he trusts in you." Isaiah 26:3

| Tim Tautges Argonne National Laboratory
| (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
| phone (gvoice): (608) 354-1459 1500 Engineering Dr.
| fax: (608) 263-4499 Madison, WI 53706
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/moab-dev/attachments/20130910/1f81e4d6/attachment-0001.html>


More information about the moab-dev mailing list