[MOAB-dev] Simple code to reproduce ICC segmentation fault
Danqing Wu
wuda at mcs.anl.gov
Tue Sep 10 14:42:37 CDT 2013
Yes, I also found it. Will try to find if there are any that still have size 2.
----- Original Message -----
From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
To: "Vijay S. Mahadevan" <vijay.m at gmail.com>
Cc: "Tim Tautges" <tautges at mcs.anl.gov>, "Danqing Wu" <wuda at mcs.anl.gov>, moab-dev at mcs.anl.gov
Sent: Tuesday, September 10, 2013 2:36:21 PM
Subject: Re: [MOAB-dev] Simple code to reproduce ICC segmentation fault
So with -C vagrind would work even without exp option?
I just realized that valgrind found another one:
ErrorCode ScdInterface::get_neighbor_alljorkori(int np, int pfrom,
const int * const gdims, const int * const gperiodic, const int * const dijk,
int &pto, int *rdims, int *facedims, int *across_bdy)
{
ErrorCode rval = MB_SUCCESS;
pto = -1;
if (np == 1) return MB_SUCCESS;
int pijk[3], lperiodic [2] , ldims[6];
rval = compute_partition_alljorkori(np, pfrom, gdims, gperiodic, ldims, lperiodic, pijk);
if (MB_SUCCESS != rval) return rval;
Can you fix this one too, Danqing?
Thanks,
Iulian
----- Original Message -----
I think having -C option for debug builds might be a good idea. It
will slow the computation down further but should catch a lot of such
errors during runtime.
I enable this option for most of the fortran codes and it has saved me
ton of headache in the past.
Vijay
On Tue, Sep 10, 2013 at 2:20 PM, Iulian Grindeanu <iulian at mcs.anl.gov> wrote:
> running the test with experimental thing --tool=exp-sgcheck found it:
> (I did not have to compile with -C)
>
> I didn't know about this experimental option
>
> Thanks,
> Iulian
>
> iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck
> scdseq_test
> ==6670== exp-sgcheck, a stack and global array overrun detector
> ==6670== NOTE: This is an Experimental-Class Valgrind Tool
> ==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd et al.
> ==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
> ==6670== Command: scdseq_test
> ==6670==
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
> Running test_parallel_partitions ...
> ==6670== Invalid write of size 4
> ==6670== at 0x418C09:
> moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int
> const*, int*, int*, int*) (ScdInterface.hpp:787)
> ==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int,
> int, int const*, int const*, int const*, int&, int*, int*, int*)
> (ScdInterface.cpp:1154)
> ==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int,
> moab::ScdParData const&, int const*, int&, int*, int*, int*)
> (ScdInterface.hpp:1216)
> ==6670== by 0x417726: test_parallel_partition(int*, int, int)
> (scdseq_test.cpp:1379)
> ==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331)
> ==6670== by 0x40E664: run_test(void (*)(), char const*)
> (TestUtil.hpp:320)
> ==6670== by 0x410229: main (scdseq_test.cpp:267)
> ==6670== Address 0x7fefff878 expected vs actual:
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from
> here
> ==6670== Actual: unknown
> ==6670== Actual: is 0 after Expected
> ==6670==
> ==6670== Invalid write of size 4
> ==6670== at 0x418C09:
> moab::ScdInterface::compute_partition_alljorkori(int, int, int const*, int
> const*, int*, int*, int*) (ScdInterface.hpp:787)
> ==6670== by 0x470E31: moab::ScdInterface::get_neighbor_alljorkori(int,
> int, int const*, int const*, int const*, int&, int*, int*, int*)
> (ScdInterface.cpp:1154)
> ==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int,
> moab::ScdParData const&, int const*, int&, int*, int*, int*)
> (ScdInterface.hpp:1216)
> ==6670== by 0x4177E8: test_parallel_partition(int*, int, int)
> (scdseq_test.cpp:1392)
> ==6670== by 0x41748A: test_parallel_partitions() (scdseq_test.cpp:1331)
> ==6670== by 0x40E664: run_test(void (*)(), char const*)
> (TestUtil.hpp:320)
> ==6670== by 0x410229: main (scdseq_test.cpp:267)
> ==6670== Address 0x7fefff878 expected vs actual:
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from
> here
> ==6670== Actual: unknown
> ==6670== Actual: is 0 after Expected
> ==6670==
> ==6670== Invalid write of size 4
> ==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int,
> int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868)
> ==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int,
> int const*, int const*, int const*, int&, int*, int*, int*)
> (ScdInterface.cpp:758)
> ==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int,
> moab::ScdParData const&, int const*, int&, int*, int*, int*)
> (ScdInterface.hpp:1219)
> ==6670== by 0x417726: test_parallel_partition(int*, int, int)
> (scdseq_test.cpp:1379)
> ==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336)
> ==6670== by 0x40E664: run_test(void (*)(), char const*)
> (TestUtil.hpp:320)
> ==6670== by 0x410229: main (scdseq_test.cpp:267)
> ==6670== Address 0x7fefff888 expected vs actual:
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from
> here
> ==6670== Actual: unknown
> ==6670== Actual: is 0 after Expected
> ==6670==
> ==6670== Invalid write of size 4
> ==6670== at 0x41917E: moab::ScdInterface::compute_partition_alljkbal(int,
> int, int const*, int const*, int*, int*, int*) (ScdInterface.hpp:868)
> ==6670== by 0x46E655: moab::ScdInterface::get_neighbor_alljkbal(int, int,
> int const*, int const*, int const*, int&, int*, int*, int*)
> (ScdInterface.cpp:758)
> ==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int,
> moab::ScdParData const&, int const*, int&, int*, int*, int*)
> (ScdInterface.hpp:1219)
> ==6670== by 0x4177E8: test_parallel_partition(int*, int, int)
> (scdseq_test.cpp:1392)
> ==6670== by 0x4174AD: test_parallel_partitions() (scdseq_test.cpp:1336)
> ==6670== by 0x40E664: run_test(void (*)(), char const*)
> (TestUtil.hpp:320)
> ==6670== by 0x410229: main (scdseq_test.cpp:267)
> ==6670== Address 0x7fefff888 expected vs actual:
> ==6670== Expected: stack array "lperiodic" of size 8 in frame 1 back from
> here
> ==6670== Actual: unknown
> ==6670== Actual: is 0 after Expected
> ==6670==
> Running test_vertex_seq ...
> Running test_element_seq ...
> Running test_periodic_seq ...
>
>
> ________________________________
>
> Hmm, I guess I'd still vote for using std::vector for statically-allocated
> arrays, rather than the alternative of
> building -C into all our debug builds. Thoughts?
>
> - tim
>
> On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote:
>> You need to compile with -C to catch static allocation errors. That
>> will specifically turn on range checks.
>>
>> Good to know that memcheck doesn't give you invalid read errors on
>> statically allocated arrays. Look at faq:
>> http://valgrind.org/docs/manual/faq.html
>>
>>> Why doesn't Memcheck find the array overruns in this program?
>>> Unfortunately, Memcheck doesn't do bounds checking on global or stack
>>> arrays. We'd like to, but it's just not possible to do in a reasonable way
>>> that fits with how Memcheck works. Sorry.
>>
>>> However, the experimental tool SGcheck can detect errors like this. Run
>>> Valgrind with the --tool=exp-sgcheck option to try it, but be aware that it
>>> is not as robust as Memcheck.
>>
>> Vijay
>>
>> On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov> wrote:
>>> Good catch Danqing, I didn't know that (that valgrind wouldn't catch out
>>> of
>>> bounds errors on statically-allocated arrays).
>>>
>>> The preferred way to do this, then, will be to use std::vector, with a
>>> static size set at instantiation. That makes it dynamically allocated
>>> but
>>> still static size. I'll remember that one.
>>>
>>> - tim
>>>
>>> On 09/10/2013 01:22 PM, Danqing Wu wrote:
>>>>
>>>> Here is what I found online:
>>>>
>>>> What Won't Valgrind Find?
>>>> Valgrind doesn't perform bounds checking on static arrays (allocated on
>>>> the stack). So if you declare an array inside your function:
>>>>
>>>> int main()
>>>> {
>>>> char x[10];
>>>> x[11] = 'a';
>>>> }
>>>>
>>>> then Valgrind won't alert you! One possible solution for testing
>>>> purposes
>>>> is simply to change your static arrays into dynamically allocated memory
>>>> taken from the heap, where you will get bounds-checking, though this
>>>> could
>>>> be a mess of unfreed memory.
>>>>
>>>> ----- Original Message -----
>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>>> Sent: Tuesday, September 10, 2013 1:02:45 PM
>>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>
>>>>
>>>>
>>>> After correcting that, moab-intel test works fine!
>>>> Good job again, Danqing!
>>>>
>>>> Thanks,
>>>> Iulian
>>>>
>>>> now the question is why valgrind did not find this ...
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>>
>>>>
>>>> I think I found one possible reason.
>>>>
>>>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom,
>>>> const int * const gdims, const int * const gperiodic, const int * const
>>>> dijk,
>>>> int &pto, int *rdims, int *facedims, int *across_bdy)
>>>> {
>>>> ...
>>>> int ldims[6], pijk[3], lperiodic[2];
>>>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims, gperiodic,
>>>> ldims, lperiodic, pijk);
>>>> ...
>>>> }
>>>>
>>>> Here lperiodic[2] should be lperiodic[3], as the third element will be
>>>> accessed inside compute_partition_alljkbal().
>>>>
>>>> The behaviour could be dependent on compilers. Maybe only for ICC 12 and
>>>> O2, and when assert is disabled, this out of memory issue causes a
>>>> segmentation fault.
>>>>
>>>> I will retest after this fix.
>>>>
>>>> ----- Original Message -----
>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>>> Sent: Tuesday, September 10, 2013 10:17:28 AM
>>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>>
>>>>
>>>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel build to
>>>> jenkins; we may have to rebuild netcdf with icc if there are issues with
>>>> libcurl.
>>>>
>>>> Any suggestions?
>>>>
>>>> Iulian
>>>> ----- Original Message -----
>>>>
>>>>
>>>> On gnep, icc 12.
>>>>
>>>> Configure option
>>>> ./configure --prefix=/homes/fathom/libs/current/moabintel
>>>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel
>>>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel
>>>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc --enable-igeom
>>>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort
>>>>
>>>> So the flags will include both -O2 and -DNDEBUG
>>>>
>>>> Here since NDEBUG is enabled, all of the assert(...) will do nothing,
>>>> and
>>>> this could make some differences.
>>>>
>>>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test can pass.
>>>> I
>>>> guess ICC 12 would be affected by the assert stuff.
>>>>
>>>> ----- Original Message -----
>>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
>>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
>>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
>>>> Sent: Tuesday, September 10, 2013 10:04:39 AM
>>>> Subject: Re: Simple code to reproduce ICC segmentation fault
>>>>
>>>>
>>>> so this is with icc -O2 or what are the compile options?
>>>> Is this on gnep? icc 12? icc 13?
>>>>
>>>> Should we try to use ubuntu 12 for intel builds?
>>>>
>>>> (we can do that on jenkins auto build platform)
>>>>
>>>> Iulian
>>>>
>>>>
>>>> ----- Original Message -----
>>>>
>>>>
>>>> I am still debugging, but it seems that the two calls of
>>>> ScdInterface::get_neighbor() caused the crash. If I comment out the
>>>> second
>>>> call, no segmentaion fault.
>>>>
>>>>
>>>> #include "moab/ScdInterface.hpp"
>>>> #include "moab/Core.hpp"
>>>>
>>>> #include <iostream>
>>>>
>>>> using namespace moab;
>>>>
>>>> int main()
>>>> {
>>>> Core moab;
>>>> ScdInterface* scdi;
>>>> ErrorCode rval = moab.Interface::query_interface(scdi);
>>>>
>>>> int gdims[] = {0, 0, 0, 48, 40, 18};
>>>> int nprocs = 4;
>>>> int pto = 0;
>>>> int across_bdy_a[3] = {0};
>>>> int rdims_a[6] = {0};
>>>> int facedims_a[6] = {0};
>>>>
>>>> ScdParData spd;
>>>> int n;
>>>> for (n = 0; n < 6; n++)
>>>> spd.gDims[n] = gdims[n];
>>>> for (n = 0; n < 3; n++)
>>>> spd.gPeriodic[n] = 0;
>>>>
>>>> spd.partMethod = ScdParData::ALLJKBAL;
>>>>
>>>> int dijka[3] = {0};
>>>>
>>>> dijka[0] = -1;
>>>> dijka[1] = -1;
>>>> dijka[2] = -1;
>>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a,
>>>> facedims_a, across_bdy_a);
>>>>
>>>> dijka[0] = 0;
>>>> dijka[1] = -1;
>>>> dijka[2] = -1;
>>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a,
>>>> facedims_a, across_bdy_a);
>>>>
>>>> std::cout << "Return from main()" << std::endl;
>>>>
>>>> return 0;
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> ================================================================
>>> "You will keep in perfect peace him whose mind is
>>> steadfast, because he trusts in you." Isaiah 26:3
>>>
>>> Tim Tautges Argonne National Laboratory
>>> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
>>> phone (gvoice): (608) 354-1459 1500 Engineering Dr.
>>> fax: (608) 263-4499 Madison, WI 53706
>>>
>>
>
> --
> ================================================================
> "You will keep in perfect peace him whose mind is
> steadfast, because he trusts in you." Isaiah 26:3
>
> Tim Tautges Argonne National Laboratory
> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
> phone (gvoice): (608) 354-1459 1500 Engineering Dr.
> fax: (608) 263-4499 Madison, WI 53706
>
>
More information about the moab-dev
mailing list