[MOAB-dev] Simple code to reproduce ICC segmentation fault

Danqing Wu wuda at mcs.anl.gov
Tue Sep 10 15:35:33 CDT 2013


It seems that exp-sgcheck still has some limitations:

It can find error this:
int main()
{
  int Stack[3];

  for (int i = 0; i <= 3; i++)
    Stack[i] = 1;

  return 0;
}

It cannot find error on this:
int main()
{
  int Stack[3];

  Stack[3] = 1;

  return 0;
}

It cannot find error on this:
int main()
{
  int Stack[3];

  Stack[0] = 1;
  Stack[1] = 1;
  Stack[2] = 1;
  Stack[3] = 1;

  return 0;
}

----- Original Message -----
From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
To: "Tim Tautges" <tautges at mcs.anl.gov>
Cc: "Danqing Wu" <wuda at mcs.anl.gov>, moab-dev at mcs.anl.gov, "Vijay S. Mahadevan" <vijay.m at gmail.com>
Sent: Tuesday, September 10, 2013 2:20:53 PM
Subject: Re: [MOAB-dev] Simple code to reproduce ICC segmentation fault


running the test with experimental thing --tool=exp-sgcheck found it: 
(I did not have to compile with -C) 

I didn't know about this experimental option 

Thanks, 
Iulian 

iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck scdseq_test 
==6670== exp-sgcheck, a stack and global array overrun detector 
==6670== NOTE: This is an Experimental-Class Valgrind Tool 
==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd et al. 
==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info 
==6670== Command: scdseq_test 
==6670== 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
Running test_parallel_partitions ... 
==6670== Invalid write of size 4 
==6670== at 0x418C09: moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:787) 
==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:1154) 
==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1216) 
==6670== by 0x417726: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1379) 
==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff878 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
==6670== Invalid write of size 4 
==6670== at 0x418C09: moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:787) 
==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:1154) 
==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1216) 
==6670== by 0x4177E8: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1392) 
==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff878 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
==6670== Invalid write of size 4 
==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868) 
==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:758) 
==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1219) 
==6670== by 0x417726: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1379) 
==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff888 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
==6670== Invalid write of size 4 
==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868) 
==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:758) 
==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1219) 
==6670== by 0x4177E8: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1392) 
==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336) 
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320) 
==6670== by 0x410229: main (scdseq_test.cpp:267) 
==6670== Address 0x7fefff888 expected vs actual: 
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here 
==6670== Actual: unknown 
==6670== Actual: is 0 after Expected 
==6670== 
Running test_vertex_seq ... 
Running test_element_seq ... 
Running test_periodic_seq ... 


----- Original Message -----


Hmm, I guess I'd still vote for using std::vector for statically-allocated arrays, rather than the alternative of 
building -C into all our debug builds. Thoughts? 

- tim 

On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote: 
> You need to compile with -C to catch static allocation errors. That 
> will specifically turn on range checks. 
> 
> Good to know that memcheck doesn't give you invalid read errors on 
> statically allocated arrays. Look at faq: 
> http://valgrind.org/docs/manual/faq.html 
> 
>> Why doesn't Memcheck find the array overruns in this program? 
>> Unfortunately, Memcheck doesn't do bounds checking on global or stack arrays. We'd like to, but it's just not possible to do in a reasonable way that fits with how Memcheck works. Sorry. 
> 
>> However, the experimental tool SGcheck can detect errors like this. Run Valgrind with the --tool=exp-sgcheck option to try it, but be aware that it is not as robust as Memcheck. 
> 
> Vijay 
> 
> On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov> wrote: 
>> Good catch Danqing, I didn't know that (that valgrind wouldn't catch out of 
>> bounds errors on statically-allocated arrays). 
>> 
>> The preferred way to do this, then, will be to use std::vector, with a 
>> static size set at instantiation. That makes it dynamically allocated but 
>> still static size. I'll remember that one. 
>> 
>> - tim 
>> 
>> On 09/10/2013 01:22 PM, Danqing Wu wrote: 
>>> 
>>> Here is what I found online: 
>>> 
>>> What Won't Valgrind Find? 
>>> Valgrind doesn't perform bounds checking on static arrays (allocated on 
>>> the stack). So if you declare an array inside your function: 
>>> 
>>> int main() 
>>> { 
>>> char x[10]; 
>>> x[11] = 'a'; 
>>> } 
>>> 
>>> then Valgrind won't alert you! One possible solution for testing purposes 
>>> is simply to change your static arrays into dynamically allocated memory 
>>> taken from the heap, where you will get bounds-checking, though this could 
>>> be a mess of unfreed memory. 
>>> 
>>> ----- Original Message ----- 
>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov> 
>>> To: "Danqing Wu" <wuda at mcs.anl.gov> 
>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov> 
>>> Sent: Tuesday, September 10, 2013 1:02:45 PM 
>>> Subject: Re: Simple code to reproduce ICC segmentation fault 
>>> 
>>> 
>>> 
>>> 
>>> ----- Original Message ----- 
>>> 
>>> 
>>> 
>>> After correcting that, moab-intel test works fine! 
>>> Good job again, Danqing! 
>>> 
>>> Thanks, 
>>> Iulian 
>>> 
>>> now the question is why valgrind did not find this ... 
>>> 
>>> 
>>> 
>>> 
>>> ----- Original Message ----- 
>>> 
>>> 
>>> I think I found one possible reason. 
>>> 
>>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom, 
>>> const int * const gdims, const int * const gperiodic, const int * const 
>>> dijk, 
>>> int &pto, int *rdims, int *facedims, int *across_bdy) 
>>> { 
>>> ... 
>>> int ldims[6], pijk[3], lperiodic[2]; 
>>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims, gperiodic, 
>>> ldims, lperiodic, pijk); 
>>> ... 
>>> } 
>>> 
>>> Here lperiodic[2] should be lperiodic[3], as the third element will be 
>>> accessed inside compute_partition_alljkbal(). 
>>> 
>>> The behaviour could be dependent on compilers. Maybe only for ICC 12 and 
>>> O2, and when assert is disabled, this out of memory issue causes a 
>>> segmentation fault. 
>>> 
>>> I will retest after this fix. 
>>> 
>>> ----- Original Message ----- 
>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov> 
>>> To: "Danqing Wu" <wuda at mcs.anl.gov> 
>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov> 
>>> Sent: Tuesday, September 10, 2013 10:17:28 AM 
>>> Subject: Re: Simple code to reproduce ICC segmentation fault 
>>> 
>>> 
>>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel build to 
>>> jenkins; we may have to rebuild netcdf with icc if there are issues with 
>>> libcurl. 
>>> 
>>> Any suggestions? 
>>> 
>>> Iulian 
>>> ----- Original Message ----- 
>>> 
>>> 
>>> On gnep, icc 12. 
>>> 
>>> Configure option 
>>> ./configure --prefix=/homes/fathom/libs/current/moabintel 
>>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel 
>>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel 
>>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc --enable-igeom 
>>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
>>> 
>>> So the flags will include both -O2 and -DNDEBUG 
>>> 
>>> Here since NDEBUG is enabled, all of the assert(...) will do nothing, and 
>>> this could make some differences. 
>>> 
>>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test can pass. I 
>>> guess ICC 12 would be affected by the assert stuff. 
>>> 
>>> ----- Original Message ----- 
>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov> 
>>> To: "Danqing Wu" <wuda at mcs.anl.gov> 
>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov> 
>>> Sent: Tuesday, September 10, 2013 10:04:39 AM 
>>> Subject: Re: Simple code to reproduce ICC segmentation fault 
>>> 
>>> 
>>> so this is with icc -O2 or what are the compile options? 
>>> Is this on gnep? icc 12? icc 13? 
>>> 
>>> Should we try to use ubuntu 12 for intel builds? 
>>> 
>>> (we can do that on jenkins auto build platform) 
>>> 
>>> Iulian 
>>> 
>>> 
>>> ----- Original Message ----- 
>>> 
>>> 
>>> I am still debugging, but it seems that the two calls of 
>>> ScdInterface::get_neighbor() caused the crash. If I comment out the second 
>>> call, no segmentaion fault. 
>>> 
>>> 
>>> #include "moab/ScdInterface.hpp" 
>>> #include "moab/Core.hpp" 
>>> 
>>> #include <iostream> 
>>> 
>>> using namespace moab; 
>>> 
>>> int main() 
>>> { 
>>> Core moab; 
>>> ScdInterface* scdi; 
>>> ErrorCode rval = moab.Interface::query_interface(scdi); 
>>> 
>>> int gdims[] = {0, 0, 0, 48, 40, 18}; 
>>> int nprocs = 4; 
>>> int pto = 0; 
>>> int across_bdy_a[3] = {0}; 
>>> int rdims_a[6] = {0}; 
>>> int facedims_a[6] = {0}; 
>>> 
>>> ScdParData spd; 
>>> int n; 
>>> for (n = 0; n < 6; n++) 
>>> spd.gDims[n] = gdims[n]; 
>>> for (n = 0; n < 3; n++) 
>>> spd.gPeriodic[n] = 0; 
>>> 
>>> spd.partMethod = ScdParData::ALLJKBAL; 
>>> 
>>> int dijka[3] = {0}; 
>>> 
>>> dijka[0] = -1; 
>>> dijka[1] = -1; 
>>> dijka[2] = -1; 
>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a, 
>>> facedims_a, across_bdy_a); 
>>> 
>>> dijka[0] = 0; 
>>> dijka[1] = -1; 
>>> dijka[2] = -1; 
>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a, 
>>> facedims_a, across_bdy_a); 
>>> 
>>> std::cout << "Return from main()" << std::endl; 
>>> 
>>> return 0; 
>>> } 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> -- 
>> ================================================================ 
>> "You will keep in perfect peace him whose mind is 
>> steadfast, because he trusts in you." Isaiah 26:3 
>> 
>> Tim Tautges Argonne National Laboratory 
>> (tautges at mcs.anl.gov) (telecommuting from UW-Madison) 
>> phone (gvoice): (608) 354-1459 1500 Engineering Dr. 
>> fax: (608) 263-4499 Madison, WI 53706 
>> 
> 

-- 
================================================================ 
"You will keep in perfect peace him whose mind is 
steadfast, because he trusts in you." Isaiah 26:3 

Tim Tautges Argonne National Laboratory 
(tautges at mcs.anl.gov) (telecommuting from UW-Madison) 
phone (gvoice): (608) 354-1459 1500 Engineering Dr. 
fax: (608) 263-4499 Madison, WI 53706 




More information about the moab-dev mailing list