[MOAB-dev] Simple code to reproduce ICC segmentation fault

Danqing Wu wuda at mcs.anl.gov
Tue Sep 10 14:42:37 CDT 2013


Yes, I also found it. Will try to find if there are any that still have size 2.

----- Original Message -----
From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
To: "Vijay S. Mahadevan" <vijay.m at gmail.com>
Cc: "Tim Tautges" <tautges at mcs.anl.gov>, "Danqing Wu" <wuda at mcs.anl.gov>, moab-dev at mcs.anl.gov
Sent: Tuesday, September 10, 2013 2:36:21 PM
Subject: Re: [MOAB-dev] Simple code to reproduce ICC segmentation fault


So with -C vagrind would work even without exp option? 
I just realized that valgrind found another one: 

ErrorCode ScdInterface::get_neighbor_alljorkori(int np, int pfrom, 
const int * const gdims, const int * const gperiodic, const int * const dijk, 
int &pto, int *rdims, int *facedims, int *across_bdy) 
{ 
ErrorCode rval = MB_SUCCESS; 
pto = -1; 
if (np == 1) return MB_SUCCESS; 

int pijk[3], lperiodic [2] , ldims[6]; 
rval = compute_partition_alljorkori(np, pfrom, gdims, gperiodic, ldims, lperiodic, pijk); 
if (MB_SUCCESS != rval) return rval; 

Can you fix this one too, Danqing? 

Thanks, 
Iulian 

----- Original Message -----


I think having -C option for debug builds might be a good idea. It 
will slow the computation down further but should catch a lot of such 
errors during runtime. 

I enable this option for most of the fortran codes and it has saved me 
ton of headache in the past. 

Vijay 

On Tue, Sep 10, 2013 at 2:20 PM, Iulian Grindeanu <iulian at mcs.anl.gov> wrote: 
> running the test with experimental thing --tool=exp-sgcheck found it: 
> (I did not have to compile with -C) 
> 
> I didn't know about this experimental option 
> 
> Thanks, 
> Iulian 
> 
> iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck 
> scdseq_test 
> ==6670== exp-sgcheck, a stack and global array overrun detector 
> ==6670== NOTE: This is an Experimental-Class Valgrind Tool 
> ==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd et al. 
> ==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info 
> ==6670== Command: scdseq_test 
> ==6670== 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93 
> Running test_parallel_partitions ... 
> ==6670== Invalid write of size 4 
> ==6670== at 0x418C09: 
> moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int 
> const*, int*, int*, int*) (ScdInterface.hpp:787) 
> ==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, 
> int, int const*, int const*, int const*, int&, int*, int*, int*) 
> (ScdInterface.cpp:1154) 
> ==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, 
> moab::ScdParData const&, int const*, int&, int*, int*, int*) 
> (ScdInterface.hpp:1216) 
> ==6670== by 0x417726: test_parallel_partition(int*, int, int) 
> (scdseq_test.cpp:1379) 
> ==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331) 
> ==6670== by 0x40E664: run_test(void (*)(), char const*) 
> (TestUtil.hpp:320) 
> ==6670== by 0x410229: main (scdseq_test.cpp:267) 
> ==6670== Address 0x7fefff878 expected vs actual: 
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from 
> here 
> ==6670== Actual: unknown 
> ==6670== Actual: is 0 after Expected 
> ==6670== 
> ==6670== Invalid write of size 4 
> ==6670== at 0x418C09: 
> moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int 
> const*, int*, int*, int*) (ScdInterface.hpp:787) 
> ==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, 
> int, int const*, int const*, int const*, int&, int*, int*, int*) 
> (ScdInterface.cpp:1154) 
> ==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, 
> moab::ScdParData const&, int const*, int&, int*, int*, int*) 
> (ScdInterface.hpp:1216) 
> ==6670== by 0x4177E8: test_parallel_partition(int*, int, int) 
> (scdseq_test.cpp:1392) 
> ==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331) 
> ==6670== by 0x40E664: run_test(void (*)(), char const*) 
> (TestUtil.hpp:320) 
> ==6670== by 0x410229: main (scdseq_test.cpp:267) 
> ==6670== Address 0x7fefff878 expected vs actual: 
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from 
> here 
> ==6670== Actual: unknown 
> ==6670== Actual: is 0 after Expected 
> ==6670== 
> ==6670== Invalid write of size 4 
> ==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, 
> int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868) 
> ==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, 
> int const*, int const*, int const*, int&, int*, int*, int*) 
> (ScdInterface.cpp:758) 
> ==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, 
> moab::ScdParData const&, int const*, int&, int*, int*, int*) 
> (ScdInterface.hpp:1219) 
> ==6670== by 0x417726: test_parallel_partition(int*, int, int) 
> (scdseq_test.cpp:1379) 
> ==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336) 
> ==6670== by 0x40E664: run_test(void (*)(), char const*) 
> (TestUtil.hpp:320) 
> ==6670== by 0x410229: main (scdseq_test.cpp:267) 
> ==6670== Address 0x7fefff888 expected vs actual: 
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from 
> here 
> ==6670== Actual: unknown 
> ==6670== Actual: is 0 after Expected 
> ==6670== 
> ==6670== Invalid write of size 4 
> ==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, 
> int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868) 
> ==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, 
> int const*, int const*, int const*, int&, int*, int*, int*) 
> (ScdInterface.cpp:758) 
> ==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, 
> moab::ScdParData const&, int const*, int&, int*, int*, int*) 
> (ScdInterface.hpp:1219) 
> ==6670== by 0x4177E8: test_parallel_partition(int*, int, int) 
> (scdseq_test.cpp:1392) 
> ==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336) 
> ==6670== by 0x40E664: run_test(void (*)(), char const*) 
> (TestUtil.hpp:320) 
> ==6670== by 0x410229: main (scdseq_test.cpp:267) 
> ==6670== Address 0x7fefff888 expected vs actual: 
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from 
> here 
> ==6670== Actual: unknown 
> ==6670== Actual: is 0 after Expected 
> ==6670== 
> Running test_vertex_seq ... 
> Running test_element_seq ... 
> Running test_periodic_seq ... 
> 
> 
> ________________________________ 
> 
> Hmm, I guess I'd still vote for using std::vector for statically-allocated 
> arrays, rather than the alternative of 
> building -C into all our debug builds. Thoughts? 
> 
> - tim 
> 
> On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote: 
>> You need to compile with -C to catch static allocation errors. That 
>> will specifically turn on range checks. 
>> 
>> Good to know that memcheck doesn't give you invalid read errors on 
>> statically allocated arrays. Look at faq: 
>> http://valgrind.org/docs/manual/faq.html 
>> 
>>> Why doesn't Memcheck find the array overruns in this program? 
>>> Unfortunately, Memcheck doesn't do bounds checking on global or stack 
>>> arrays. We'd like to, but it's just not possible to do in a reasonable way 
>>> that fits with how Memcheck works. Sorry. 
>> 
>>> However, the experimental tool SGcheck can detect errors like this. Run 
>>> Valgrind with the --tool=exp-sgcheck option to try it, but be aware that it 
>>> is not as robust as Memcheck. 
>> 
>> Vijay 
>> 
>> On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov> wrote: 
>>> Good catch Danqing, I didn't know that (that valgrind wouldn't catch out 
>>> of 
>>> bounds errors on statically-allocated arrays). 
>>> 
>>> The preferred way to do this, then, will be to use std::vector, with a 
>>> static size set at instantiation. That makes it dynamically allocated 
>>> but 
>>> still static size. I'll remember that one. 
>>> 
>>> - tim 
>>> 
>>> On 09/10/2013 01:22 PM, Danqing Wu wrote: 
>>>> 
>>>> Here is what I found online: 
>>>> 
>>>> What Won't Valgrind Find? 
>>>> Valgrind doesn't perform bounds checking on static arrays (allocated on 
>>>> the stack). So if you declare an array inside your function: 
>>>> 
>>>> int main() 
>>>> { 
>>>> char x[10]; 
>>>> x[11] = 'a'; 
>>>> } 
>>>> 
>>>> then Valgrind won't alert you! One possible solution for testing 
>>>> purposes 
>>>> is simply to change your static arrays into dynamically allocated memory 
>>>> taken from the heap, where you will get bounds-checking, though this 
>>>> could 
>>>> be a mess of unfreed memory. 
>>>> 
>>>> ----- Original Message ----- 
>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov> 
>>>> To: "Danqing Wu" <wuda at mcs.anl.gov> 
>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov> 
>>>> Sent: Tuesday, September 10, 2013 1:02:45 PM 
>>>> Subject: Re: Simple code to reproduce ICC segmentation fault 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message ----- 
>>>> 
>>>> 
>>>> 
>>>> After correcting that, moab-intel test works fine! 
>>>> Good job again, Danqing! 
>>>> 
>>>> Thanks, 
>>>> Iulian 
>>>> 
>>>> now the question is why valgrind did not find this ... 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message ----- 
>>>> 
>>>> 
>>>> I think I found one possible reason. 
>>>> 
>>>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom, 
>>>> const int * const gdims, const int * const gperiodic, const int * const 
>>>> dijk, 
>>>> int &pto, int *rdims, int *facedims, int *across_bdy) 
>>>> { 
>>>> ... 
>>>> int ldims[6], pijk[3], lperiodic[2]; 
>>>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims, gperiodic, 
>>>> ldims, lperiodic, pijk); 
>>>> ... 
>>>> } 
>>>> 
>>>> Here lperiodic[2] should be lperiodic[3], as the third element will be 
>>>> accessed inside compute_partition_alljkbal(). 
>>>> 
>>>> The behaviour could be dependent on compilers. Maybe only for ICC 12 and 
>>>> O2, and when assert is disabled, this out of memory issue causes a 
>>>> segmentation fault. 
>>>> 
>>>> I will retest after this fix. 
>>>> 
>>>> ----- Original Message ----- 
>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov> 
>>>> To: "Danqing Wu" <wuda at mcs.anl.gov> 
>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov> 
>>>> Sent: Tuesday, September 10, 2013 10:17:28 AM 
>>>> Subject: Re: Simple code to reproduce ICC segmentation fault 
>>>> 
>>>> 
>>>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel build to 
>>>> jenkins; we may have to rebuild netcdf with icc if there are issues with 
>>>> libcurl. 
>>>> 
>>>> Any suggestions? 
>>>> 
>>>> Iulian 
>>>> ----- Original Message ----- 
>>>> 
>>>> 
>>>> On gnep, icc 12. 
>>>> 
>>>> Configure option 
>>>> ./configure --prefix=/homes/fathom/libs/current/moabintel 
>>>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel 
>>>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel 
>>>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc --enable-igeom 
>>>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
>>>> 
>>>> So the flags will include both -O2 and -DNDEBUG 
>>>> 
>>>> Here since NDEBUG is enabled, all of the assert(...) will do nothing, 
>>>> and 
>>>> this could make some differences. 
>>>> 
>>>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test can pass. 
>>>> I 
>>>> guess ICC 12 would be affected by the assert stuff. 
>>>> 
>>>> ----- Original Message ----- 
>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov> 
>>>> To: "Danqing Wu" <wuda at mcs.anl.gov> 
>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov> 
>>>> Sent: Tuesday, September 10, 2013 10:04:39 AM 
>>>> Subject: Re: Simple code to reproduce ICC segmentation fault 
>>>> 
>>>> 
>>>> so this is with icc -O2 or what are the compile options? 
>>>> Is this on gnep? icc 12? icc 13? 
>>>> 
>>>> Should we try to use ubuntu 12 for intel builds? 
>>>> 
>>>> (we can do that on jenkins auto build platform) 
>>>> 
>>>> Iulian 
>>>> 
>>>> 
>>>> ----- Original Message ----- 
>>>> 
>>>> 
>>>> I am still debugging, but it seems that the two calls of 
>>>> ScdInterface::get_neighbor() caused the crash. If I comment out the 
>>>> second 
>>>> call, no segmentaion fault. 
>>>> 
>>>> 
>>>> #include "moab/ScdInterface.hpp" 
>>>> #include "moab/Core.hpp" 
>>>> 
>>>> #include <iostream> 
>>>> 
>>>> using namespace moab; 
>>>> 
>>>> int main() 
>>>> { 
>>>> Core moab; 
>>>> ScdInterface* scdi; 
>>>> ErrorCode rval = moab.Interface::query_interface(scdi); 
>>>> 
>>>> int gdims[] = {0, 0, 0, 48, 40, 18}; 
>>>> int nprocs = 4; 
>>>> int pto = 0; 
>>>> int across_bdy_a[3] = {0}; 
>>>> int rdims_a[6] = {0}; 
>>>> int facedims_a[6] = {0}; 
>>>> 
>>>> ScdParData spd; 
>>>> int n; 
>>>> for (n = 0; n < 6; n++) 
>>>> spd.gDims[n] = gdims[n]; 
>>>> for (n = 0; n < 3; n++) 
>>>> spd.gPeriodic[n] = 0; 
>>>> 
>>>> spd.partMethod = ScdParData::ALLJKBAL; 
>>>> 
>>>> int dijka[3] = {0}; 
>>>> 
>>>> dijka[0] = -1; 
>>>> dijka[1] = -1; 
>>>> dijka[2] = -1; 
>>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a, 
>>>> facedims_a, across_bdy_a); 
>>>> 
>>>> dijka[0] = 0; 
>>>> dijka[1] = -1; 
>>>> dijka[2] = -1; 
>>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a, 
>>>> facedims_a, across_bdy_a); 
>>>> 
>>>> std::cout << "Return from main()" << std::endl; 
>>>> 
>>>> return 0; 
>>>> } 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> ================================================================ 
>>> "You will keep in perfect peace him whose mind is 
>>> steadfast, because he trusts in you." Isaiah 26:3 
>>> 
>>> Tim Tautges Argonne National Laboratory 
>>> (tautges at mcs.anl.gov) (telecommuting from UW-Madison) 
>>> phone (gvoice): (608) 354-1459 1500 Engineering Dr. 
>>> fax: (608) 263-4499 Madison, WI 53706 
>>> 
>> 
> 
> -- 
> ================================================================ 
> "You will keep in perfect peace him whose mind is 
> steadfast, because he trusts in you." Isaiah 26:3 
> 
> Tim Tautges Argonne National Laboratory 
> (tautges at mcs.anl.gov) (telecommuting from UW-Madison) 
> phone (gvoice): (608) 354-1459 1500 Engineering Dr. 
> fax: (608) 263-4499 Madison, WI 53706 
> 
> 



More information about the moab-dev mailing list