[MOAB-dev] Simple code to reproduce ICC segmentation fault

Tim Tautges tautges at mcs.anl.gov
Tue Sep 10 14:41:47 CDT 2013



On 09/10/2013 02:28 PM, Vijay S. Mahadevan wrote:
> I think having -C option for debug builds might be a good idea. It
> will slow the computation down further but should catch a lot of such
> errors during runtime.
>
> I enable this option for most of the fortran codes and it has saved me
> ton of headache in the past.

I would definitely do that for a fortran code (though one of the primary fortran codes we work with uses a lot of e.g.

       real xarg(1)

for real array arguments, instead of the more advisable xarg(*), so that strategy probably gives lots of false positives 
with that particular code.

I'm still not sure about for C++ codes, where we use static arrays much less often.

- tim

>
> Vijay
>
> On Tue, Sep 10, 2013 at 2:20 PM, Iulian Grindeanu <iulian at mcs.anl.gov> wrote:
>> running the test with experimental thing --tool=exp-sgcheck  found it:
>> (I did not have to compile with -C)
>>
>> I didn't know about this experimental option
>>
>> Thanks,
>> Iulian
>>
>> iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck
>> scdseq_test
>> ==6670== exp-sgcheck, a stack and global array overrun detector
>> ==6670== NOTE: This is an Experimental-Class Valgrind Tool
>> ==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd et al.
>> ==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
>> ==6670== Command: scdseq_test
>> ==6670==
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
>> Running test_parallel_partitions ...
>> ==6670== Invalid write of size 4
>> ==6670==    at 0x418C09:
>> moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int
>> const*, int*, int*, int*) (ScdInterface.hpp:787)
>> ==6670==    by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int,
>> int, int const*, int const*, int const*, int&, int*, int*, int*)
>> (ScdInterface.cpp:1154)
>> ==6670==    by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int,
>> moab::ScdParData const&, int const*, int&, int*, int*, int*)
>> (ScdInterface.hpp:1216)
>> ==6670==    by 0x417726: test_parallel_partition(int*, int, int)
>> (scdseq_test.cpp:1379)
>> ==6670==    by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331)
>> ==6670==    by 0x40E664: run_test(void (*)(), char const*)
>> (TestUtil.hpp:320)
>> ==6670==    by 0x410229: main (scdseq_test.cpp:267)
>> ==6670==  Address 0x7fefff878 expected vs actual:
>> ==6670==  Expected: stack array "lperiodic" of size 8 in frame 1 back from
>> here
>> ==6670==  Actual:   unknown
>> ==6670==  Actual:   is 0 after Expected
>> ==6670==
>> ==6670== Invalid write of size 4
>> ==6670==    at 0x418C09:
>> moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int
>> const*, int*, int*, int*) (ScdInterface.hpp:787)
>> ==6670==    by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int,
>> int, int const*, int const*, int const*, int&, int*, int*, int*)
>> (ScdInterface.cpp:1154)
>> ==6670==    by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int,
>> moab::ScdParData const&, int const*, int&, int*, int*, int*)
>> (ScdInterface.hpp:1216)
>> ==6670==    by 0x4177E8: test_parallel_partition(int*, int, int)
>> (scdseq_test.cpp:1392)
>> ==6670==    by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331)
>> ==6670==    by 0x40E664: run_test(void (*)(), char const*)
>> (TestUtil.hpp:320)
>> ==6670==    by 0x410229: main (scdseq_test.cpp:267)
>> ==6670==  Address 0x7fefff878 expected vs actual:
>> ==6670==  Expected: stack array "lperiodic" of size 8 in frame 1 back from
>> here
>> ==6670==  Actual:   unknown
>> ==6670==  Actual:   is 0 after Expected
>> ==6670==
>> ==6670== Invalid write of size 4
>> ==6670==    at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int,
>> int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868)
>> ==6670==    by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int,
>> int const*, int const*, int const*, int&, int*, int*, int*)
>> (ScdInterface.cpp:758)
>> ==6670==    by 0x41B01C: moab::ScdInterface::get_neighbor(int, int,
>> moab::ScdParData const&, int const*, int&, int*, int*, int*)
>> (ScdInterface.hpp:1219)
>> ==6670==    by 0x417726: test_parallel_partition(int*, int, int)
>> (scdseq_test.cpp:1379)
>> ==6670==    by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336)
>> ==6670==    by 0x40E664: run_test(void (*)(), char const*)
>> (TestUtil.hpp:320)
>> ==6670==    by 0x410229: main (scdseq_test.cpp:267)
>> ==6670==  Address 0x7fefff888 expected vs actual:
>> ==6670==  Expected: stack array "lperiodic" of size 8 in frame 1 back from
>> here
>> ==6670==  Actual:   unknown
>> ==6670==  Actual:   is 0 after Expected
>> ==6670==
>> ==6670== Invalid write of size 4
>> ==6670==    at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int,
>> int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868)
>> ==6670==    by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int,
>> int const*, int const*, int const*, int&, int*, int*, int*)
>> (ScdInterface.cpp:758)
>> ==6670==    by 0x41B01C: moab::ScdInterface::get_neighbor(int, int,
>> moab::ScdParData const&, int const*, int&, int*, int*, int*)
>> (ScdInterface.hpp:1219)
>> ==6670==    by 0x4177E8: test_parallel_partition(int*, int, int)
>> (scdseq_test.cpp:1392)
>> ==6670==    by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336)
>> ==6670==    by 0x40E664: run_test(void (*)(), char const*)
>> (TestUtil.hpp:320)
>> ==6670==    by 0x410229: main (scdseq_test.cpp:267)
>> ==6670==  Address 0x7fefff888 expected vs actual:
>> ==6670==  Expected: stack array "lperiodic" of size 8 in frame 1 back from
>> here
>> ==6670==  Actual:   unknown
>> ==6670==  Actual:   is 0 after Expected
>> ==6670==
>> Running test_vertex_seq ...
>> Running test_element_seq ...
>> Running test_periodic_seq ...
>>
>>
>> ________________________________
>>
>> Hmm, I guess I'd still vote for using std::vector for statically-allocated
>> arrays, rather than the alternative of
>> building -C into all our debug builds.  Thoughts?
>>
>> - tim
>>
>> On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote:
>>> You need to compile with -C to catch static allocation errors. That
>>> will specifically turn on range checks.
>>>
>>> Good to know that memcheck doesn't give you invalid read errors on
>>> statically allocated arrays. Look at faq:
>>> http://valgrind.org/docs/manual/faq.html
>>>
>>>> Why doesn't Memcheck find the array overruns in this program?
>>>> Unfortunately, Memcheck doesn't do bounds checking on global or stack
>>>> arrays. We'd like to, but it's just not possible to do in a reasonable way
>>>> that fits with how Memcheck works. Sorry.
>>>
>>>> However, the experimental tool SGcheck can detect errors like this. Run
>>>> Valgrind with the --tool=exp-sgcheck option to try it, but be aware that it
>>>> is not as robust as Memcheck.
>>>
>>> Vijay
>>>
>>> On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov> wrote:
>>>> Good catch Danqing, I didn't know that (that valgrind wouldn't catch out
>>>> of
>>>> bounds errors on statically-allocated arrays).
>>>>
>>>> The preferred way to do this, then, will be to use std::vector, with a
>>>> static size set at instantiation.  That makes it dynamically allocated
>>>> but
>>>> still static size.  I'll remember that one.
>>>>
>>>> - tim
>>>>
>>>> On 09/10/2013 01:22 PM, Danqing Wu wrote:
>>>>>
>>>>> Here is what I found online:
>>>>>
>>>>> What Won't Valgrind Find?
>>>>> Valgrind doesn't perform bounds checking on static arrays (allocated on
>>>>> the stack). So if you declare an array inside your function:
>>>>>
>>>>> int main()
>>>>> {
>>>>>        char x[10];
>>>>>        x[11] = 'a';
>>>>> }
>>>>>
>>>>> then Valgrind won't alert you! One possible solution for testing
>>>>> purposes
>>>>> is simply to change your static arrays into dynamically allocated memory
>>>>> taken from the heap, where you will get bounds-checking, though this
>>>>> could
>>>>> be a mess of unfreed memory.
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>>>> Sent: Tuesday, September 10, 2013 1:02:45 PM
>>>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>
>>>>>
>>>>>
>>>>> After correcting that, moab-intel test works fine!
>>>>> Good job again, Danqing!
>>>>>
>>>>> Thanks,
>>>>> Iulian
>>>>>
>>>>> now the question is why valgrind did not find this ...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>
>>>>>
>>>>> I think I found one possible reason.
>>>>>
>>>>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom,
>>>>> const int * const gdims, const int * const gperiodic, const int * const
>>>>> dijk,
>>>>> int &pto, int *rdims, int *facedims, int *across_bdy)
>>>>> {
>>>>> ...
>>>>> int ldims[6], pijk[3], lperiodic[2];
>>>>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims, gperiodic,
>>>>> ldims, lperiodic, pijk);
>>>>> ...
>>>>> }
>>>>>
>>>>> Here lperiodic[2] should be lperiodic[3], as the third element will be
>>>>> accessed inside compute_partition_alljkbal().
>>>>>
>>>>> The behaviour could be dependent on compilers. Maybe only for ICC 12 and
>>>>> O2, and when assert is disabled, this out of memory issue causes a
>>>>> segmentation fault.
>>>>>
>>>>> I will retest after this fix.
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>>>> Sent: Tuesday, September 10, 2013 10:17:28 AM
>>>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>>>
>>>>>
>>>>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel build to
>>>>> jenkins; we may have to rebuild netcdf with icc if there are issues with
>>>>> libcurl.
>>>>>
>>>>> Any suggestions?
>>>>>
>>>>> Iulian
>>>>> ----- Original Message -----
>>>>>
>>>>>
>>>>> On gnep, icc 12.
>>>>>
>>>>> Configure option
>>>>> ./configure --prefix=/homes/fathom/libs/current/moabintel
>>>>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel
>>>>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel
>>>>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc --enable-igeom
>>>>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort
>>>>>
>>>>> So the flags will include both -O2 and -DNDEBUG
>>>>>
>>>>> Here since NDEBUG is enabled, all of the assert(...) will do nothing,
>>>>> and
>>>>> this could make some differences.
>>>>>
>>>>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test can pass.
>>>>> I
>>>>> guess ICC 12 would be affected by the assert stuff.
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>>>> Sent: Tuesday, September 10, 2013 10:04:39 AM
>>>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>>>
>>>>>
>>>>> so this is with icc -O2 or what are the compile options?
>>>>> Is this on gnep? icc 12? icc 13?
>>>>>
>>>>> Should we try to use ubuntu 12 for intel builds?
>>>>>
>>>>> (we can do that on jenkins auto build platform)
>>>>>
>>>>> Iulian
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>
>>>>>
>>>>> I am still debugging, but it seems that the two calls of
>>>>> ScdInterface::get_neighbor() caused the crash. If I comment out the
>>>>> second
>>>>> call, no segmentaion fault.
>>>>>
>>>>>
>>>>> #include "moab/ScdInterface.hpp"
>>>>> #include "moab/Core.hpp"
>>>>>
>>>>> #include <iostream>
>>>>>
>>>>> using namespace moab;
>>>>>
>>>>> int main()
>>>>> {
>>>>> Core moab;
>>>>> ScdInterface* scdi;
>>>>> ErrorCode rval = moab.Interface::query_interface(scdi);
>>>>>
>>>>> int gdims[] = {0, 0, 0, 48, 40, 18};
>>>>> int nprocs = 4;
>>>>> int pto = 0;
>>>>> int across_bdy_a[3] = {0};
>>>>> int rdims_a[6] = {0};
>>>>> int facedims_a[6] = {0};
>>>>>
>>>>> ScdParData spd;
>>>>> int n;
>>>>> for (n = 0; n < 6; n++)
>>>>> spd.gDims[n] = gdims[n];
>>>>> for (n = 0; n < 3; n++)
>>>>> spd.gPeriodic[n] = 0;
>>>>>
>>>>> spd.partMethod = ScdParData::ALLJKBAL;
>>>>>
>>>>> int dijka[3] = {0};
>>>>>
>>>>> dijka[0] = -1;
>>>>> dijka[1] = -1;
>>>>> dijka[2] = -1;
>>>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a,
>>>>> facedims_a, across_bdy_a);
>>>>>
>>>>> dijka[0] = 0;
>>>>> dijka[1] = -1;
>>>>> dijka[2] = -1;
>>>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a,
>>>>> facedims_a, across_bdy_a);
>>>>>
>>>>> std::cout << "Return from main()" << std::endl;
>>>>>
>>>>> return 0;
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> ================================================================
>>>> "You will keep in perfect peace him whose mind is
>>>>     steadfast, because he trusts in you."               Isaiah 26:3
>>>>
>>>>                Tim Tautges            Argonne National Laboratory
>>>>            (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
>>>>    phone (gvoice): (608) 354-1459      1500 Engineering Dr.
>>>>               fax: (608) 263-4499      Madison, WI 53706
>>>>
>>>
>>
>> --
>> ================================================================
>> "You will keep in perfect peace him whose mind is
>>     steadfast, because he trusts in you."               Isaiah 26:3
>>
>>                Tim Tautges            Argonne National Laboratory
>>            (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
>>    phone (gvoice): (608) 354-1459      1500 Engineering Dr.
>>               fax: (608) 263-4499      Madison, WI 53706
>>
>>
>

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
  phone (gvoice): (608) 354-1459      1500 Engineering Dr.
             fax: (608) 263-4499      Madison, WI 53706



More information about the moab-dev mailing list