[MOAB-dev] Simple code to reproduce ICC segmentation fault
Danqing Wu
wuda at mcs.anl.gov
Tue Sep 10 15:35:33 CDT 2013
It seems that exp-sgcheck still has some limitations:
It can find error this:
int main()
{
int Stack[3];
for (int i = 0; i <= 3; i++)
Stack[i] = 1;
return 0;
}
It cannot find error on this:
int main()
{
int Stack[3];
Stack[3] = 1;
return 0;
}
It cannot find error on this:
int main()
{
int Stack[3];
Stack[0] = 1;
Stack[1] = 1;
Stack[2] = 1;
Stack[3] = 1;
return 0;
}
----- Original Message -----
From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
To: "Tim Tautges" <tautges at mcs.anl.gov>
Cc: "Danqing Wu" <wuda at mcs.anl.gov>, moab-dev at mcs.anl.gov, "Vijay S. Mahadevan" <vijay.m at gmail.com>
Sent: Tuesday, September 10, 2013 2:20:53 PM
Subject: Re: [MOAB-dev] Simple code to reproduce ICC segmentation fault
running the test with experimental thing --tool=exp-sgcheck found it:
(I did not have to compile with -C)
I didn't know about this experimental option
Thanks,
Iulian
iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck scdseq_test
==6670== exp-sgcheck, a stack and global array overrun detector
==6670== NOTE: This is an Experimental-Class Valgrind Tool
==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd et al.
==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==6670== Command: scdseq_test
==6670==
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
--6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
Running test_parallel_partitions ...
==6670== Invalid write of size 4
==6670== at 0x418C09: moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:787)
==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:1154)
==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1216)
==6670== by 0x417726: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1379)
==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331)
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320)
==6670== by 0x410229: main (scdseq_test.cpp:267)
==6670== Address 0x7fefff878 expected vs actual:
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here
==6670== Actual: unknown
==6670== Actual: is 0 after Expected
==6670==
==6670== Invalid write of size 4
==6670== at 0x418C09: moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:787)
==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:1154)
==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1216)
==6670== by 0x4177E8: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1392)
==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331)
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320)
==6670== by 0x410229: main (scdseq_test.cpp:267)
==6670== Address 0x7fefff878 expected vs actual:
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here
==6670== Actual: unknown
==6670== Actual: is 0 after Expected
==6670==
==6670== Invalid write of size 4
==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868)
==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:758)
==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1219)
==6670== by 0x417726: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1379)
==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336)
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320)
==6670== by 0x410229: main (scdseq_test.cpp:267)
==6670== Address 0x7fefff888 expected vs actual:
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here
==6670== Actual: unknown
==6670== Actual: is 0 after Expected
==6670==
==6670== Invalid write of size 4
==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int, int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868)
==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int, int const*, int const*, int const*, int&, int*, int*, int*) (ScdInterface.cpp:758)
==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int, moab::ScdParData const&, int const*, int&, int*, int*, int*) (ScdInterface.hpp:1219)
==6670== by 0x4177E8: test_parallel_partition(int*, int, int) (scdseq_test.cpp:1392)
==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336)
==6670== by 0x40E664: run_test(void (*)(), char const*) (TestUtil.hpp:320)
==6670== by 0x410229: main (scdseq_test.cpp:267)
==6670== Address 0x7fefff888 expected vs actual:
==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from here
==6670== Actual: unknown
==6670== Actual: is 0 after Expected
==6670==
Running test_vertex_seq ...
Running test_element_seq ...
Running test_periodic_seq ...
----- Original Message -----
Hmm, I guess I'd still vote for using std::vector for statically-allocated arrays, rather than the alternative of
building -C into all our debug builds. Thoughts?
- tim
On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote:
> You need to compile with -C to catch static allocation errors. That
> will specifically turn on range checks.
>
> Good to know that memcheck doesn't give you invalid read errors on
> statically allocated arrays. Look at faq:
> http://valgrind.org/docs/manual/faq.html
>
>> Why doesn't Memcheck find the array overruns in this program?
>> Unfortunately, Memcheck doesn't do bounds checking on global or stack arrays. We'd like to, but it's just not possible to do in a reasonable way that fits with how Memcheck works. Sorry.
>
>> However, the experimental tool SGcheck can detect errors like this. Run Valgrind with the --tool=exp-sgcheck option to try it, but be aware that it is not as robust as Memcheck.
>
> Vijay
>
> On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov> wrote:
>> Good catch Danqing, I didn't know that (that valgrind wouldn't catch out of
>> bounds errors on statically-allocated arrays).
>>
>> The preferred way to do this, then, will be to use std::vector, with a
>> static size set at instantiation. That makes it dynamically allocated but
>> still static size. I'll remember that one.
>>
>> - tim
>>
>> On 09/10/2013 01:22 PM, Danqing Wu wrote:
>>>
>>> Here is what I found online:
>>>
>>> What Won't Valgrind Find?
>>> Valgrind doesn't perform bounds checking on static arrays (allocated on
>>> the stack). So if you declare an array inside your function:
>>>
>>> int main()
>>> {
>>> char x[10];
>>> x[11] = 'a';
>>> }
>>>
>>> then Valgrind won't alert you! One possible solution for testing purposes
>>> is simply to change your static arrays into dynamically allocated memory
>>> taken from the heap, where you will get bounds-checking, though this could
>>> be a mess of unfreed memory.
>>>
>>> ----- Original Message -----
>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>> Sent: Tuesday, September 10, 2013 1:02:45 PM
>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>>
>>>
>>>
>>> After correcting that, moab-intel test works fine!
>>> Good job again, Danqing!
>>>
>>> Thanks,
>>> Iulian
>>>
>>> now the question is why valgrind did not find this ...
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>>
>>>
>>> I think I found one possible reason.
>>>
>>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom,
>>> const int * const gdims, const int * const gperiodic, const int * const
>>> dijk,
>>> int &pto, int *rdims, int *facedims, int *across_bdy)
>>> {
>>> ...
>>> int ldims[6], pijk[3], lperiodic[2];
>>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims, gperiodic,
>>> ldims, lperiodic, pijk);
>>> ...
>>> }
>>>
>>> Here lperiodic[2] should be lperiodic[3], as the third element will be
>>> accessed inside compute_partition_alljkbal().
>>>
>>> The behaviour could be dependent on compilers. Maybe only for ICC 12 and
>>> O2, and when assert is disabled, this out of memory issue causes a
>>> segmentation fault.
>>>
>>> I will retest after this fix.
>>>
>>> ----- Original Message -----
>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>> Sent: Tuesday, September 10, 2013 10:17:28 AM
>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>
>>>
>>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel build to
>>> jenkins; we may have to rebuild netcdf with icc if there are issues with
>>> libcurl.
>>>
>>> Any suggestions?
>>>
>>> Iulian
>>> ----- Original Message -----
>>>
>>>
>>> On gnep, icc 12.
>>>
>>> Configure option
>>> ./configure --prefix=/homes/fathom/libs/current/moabintel
>>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel
>>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel
>>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc --enable-igeom
>>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort
>>>
>>> So the flags will include both -O2 and -DNDEBUG
>>>
>>> Here since NDEBUG is enabled, all of the assert(...) will do nothing, and
>>> this could make some differences.
>>>
>>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test can pass. I
>>> guess ICC 12 would be affected by the assert stuff.
>>>
>>> ----- Original Message -----
>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>> Sent: Tuesday, September 10, 2013 10:04:39 AM
>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>
>>>
>>> so this is with icc -O2 or what are the compile options?
>>> Is this on gnep? icc 12? icc 13?
>>>
>>> Should we try to use ubuntu 12 for intel builds?
>>>
>>> (we can do that on jenkins auto build platform)
>>>
>>> Iulian
>>>
>>>
>>> ----- Original Message -----
>>>
>>>
>>> I am still debugging, but it seems that the two calls of
>>> ScdInterface::get_neighbor() caused the crash. If I comment out the second
>>> call, no segmentaion fault.
>>>
>>>
>>> #include "moab/ScdInterface.hpp"
>>> #include "moab/Core.hpp"
>>>
>>> #include <iostream>
>>>
>>> using namespace moab;
>>>
>>> int main()
>>> {
>>> Core moab;
>>> ScdInterface* scdi;
>>> ErrorCode rval = moab.Interface::query_interface(scdi);
>>>
>>> int gdims[] = {0, 0, 0, 48, 40, 18};
>>> int nprocs = 4;
>>> int pto = 0;
>>> int across_bdy_a[3] = {0};
>>> int rdims_a[6] = {0};
>>> int facedims_a[6] = {0};
>>>
>>> ScdParData spd;
>>> int n;
>>> for (n = 0; n < 6; n++)
>>> spd.gDims[n] = gdims[n];
>>> for (n = 0; n < 3; n++)
>>> spd.gPeriodic[n] = 0;
>>>
>>> spd.partMethod = ScdParData::ALLJKBAL;
>>>
>>> int dijka[3] = {0};
>>>
>>> dijka[0] = -1;
>>> dijka[1] = -1;
>>> dijka[2] = -1;
>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a,
>>> facedims_a, across_bdy_a);
>>>
>>> dijka[0] = 0;
>>> dijka[1] = -1;
>>> dijka[2] = -1;
>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a,
>>> facedims_a, across_bdy_a);
>>>
>>> std::cout << "Return from main()" << std::endl;
>>>
>>> return 0;
>>> }
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> ================================================================
>> "You will keep in perfect peace him whose mind is
>> steadfast, because he trusts in you." Isaiah 26:3
>>
>> Tim Tautges Argonne National Laboratory
>> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
>> phone (gvoice): (608) 354-1459 1500 Engineering Dr.
>> fax: (608) 263-4499 Madison, WI 53706
>>
>
--
================================================================
"You will keep in perfect peace him whose mind is
steadfast, because he trusts in you." Isaiah 26:3
Tim Tautges Argonne National Laboratory
(tautges at mcs.anl.gov) (telecommuting from UW-Madison)
phone (gvoice): (608) 354-1459 1500 Engineering Dr.
fax: (608) 263-4499 Madison, WI 53706
More information about the moab-dev
mailing list