[MOAB-dev] Simple code to reproduce ICC segmentation fault

Iulian Grindeanu iulian at mcs.anl.gov
Tue Sep 10 14:36:21 CDT 2013


So with -C vagrind would work even without exp option? 
I just realized that valgrind found another one: 

ErrorCode ScdInterface::get_neighbor_alljorkori(int np, int pfrom, 
const int * const gdims, const int * const gperiodic, const int * const dijk, 
int &pto, int *rdims, int *facedims, int *across_bdy) 
{ 
ErrorCode rval = MB_SUCCESS; 
pto = -1; 
if (np == 1) return MB_SUCCESS; 

int pijk[3], lperiodic [2] , ldims[6]; 
rval = compute_partition_alljorkori(np, pfrom, gdims, gperiodic, ldims, lperiodic, pijk); 
if (MB_SUCCESS != rval) return rval; 

Can you fix this one too, Danqing? 

Thanks, 
Iulian 

----- Original Message -----

| I think having -C option for debug builds might be a good idea. It
| will slow the computation down further but should catch a lot of such
| errors during runtime.

| I enable this option for most of the fortran codes and it has saved
| me
| ton of headache in the past.

| Vijay

| On Tue, Sep 10, 2013 at 2:20 PM, Iulian Grindeanu
| <iulian at mcs.anl.gov> wrote:
| > running the test with experimental thing --tool=exp-sgcheck found
| > it:
| > (I did not have to compile with -C)
| >
| > I didn't know about this experimental option
| >
| > Thanks,
| > Iulian
| >
| > iulian at T520-iuli:~/source/MOABp13/test$ valgrind --tool=exp-sgcheck
| > scdseq_test
| > ==6670== exp-sgcheck, a stack and global array overrun detector
| > ==6670== NOTE: This is an Experimental-Class Valgrind Tool
| > ==6670== Copyright (C) 2003-2011, and GNU GPL'd, by OpenWorks Ltd
| > et al.
| > ==6670== Using Valgrind-3.7.0 and LibVEX; rerun with -h for
| > copyright info
| > ==6670== Command: scdseq_test
| > ==6670==
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > --6670-- warning: evaluate_Dwarf3_Expr: unhandled DW_OP_ 0x93
| > Running test_parallel_partitions ...
| > ==6670== Invalid write of size 4
| > ==6670== at 0x418C09:
| > moab::ScdInterface::compute_partition_alljorkori(int, int, int
| > const*, int
| > const*, int*, int*, int*) (ScdInterface.hpp:787)
| > ==6670== by 0x470E31:
| > moab::ScdInterface::get_neighbor_alljorkori(int,
| > int, int const*, int const*, int const*, int&, int*, int*, int*)
| > (ScdInterface.cpp:1154)
| > ==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int,
| > moab::ScdParData const&, int const*, int&, int*, int*, int*)
| > (ScdInterface.hpp:1216)
| > ==6670== by 0x417726: test_parallel_partition(int*, int, int)
| > (scdseq_test.cpp:1379)
| > ==6670== by 0x41748A: test_parallel_partitions()
| > (scdseq_test.cpp:1331)
| > ==6670== by 0x40E664: run_test(void (*)(), char const*)
| > (TestUtil.hpp:320)
| > ==6670== by 0x410229: main (scdseq_test.cpp:267)
| > ==6670== Address 0x7fefff878 expected vs actual:
| > ==6670== Expected: stack array "lperiodic" of size 8 in frame 1
| > back from
| > here
| > ==6670== Actual: unknown
| > ==6670== Actual: is 0 after Expected
| > ==6670==
| > ==6670== Invalid write of size 4
| > ==6670== at 0x418C09:
| > moab::ScdInterface::compute_partition_alljorkori(int, int, int
| > const*, int
| > const*, int*, int*, int*) (ScdInterface.hpp:787)
| > ==6670== by 0x470E31:
| > moab::ScdInterface::get_neighbor_alljorkori(int,
| > int, int const*, int const*, int const*, int&, int*, int*, int*)
| > (ScdInterface.cpp:1154)
| > ==6670== by 0x41AFD5: moab::ScdInterface::get_neighbor(int, int,
| > moab::ScdParData const&, int const*, int&, int*, int*, int*)
| > (ScdInterface.hpp:1216)
| > ==6670== by 0x4177E8: test_parallel_partition(int*, int, int)
| > (scdseq_test.cpp:1392)
| > ==6670== by 0x41748A: test_parallel_partitions()
| > (scdseq_test.cpp:1331)
| > ==6670== by 0x40E664: run_test(void (*)(), char const*)
| > (TestUtil.hpp:320)
| > ==6670== by 0x410229: main (scdseq_test.cpp:267)
| > ==6670== Address 0x7fefff878 expected vs actual:
| > ==6670== Expected: stack array "lperiodic" of size 8 in frame 1
| > back from
| > here
| > ==6670== Actual: unknown
| > ==6670== Actual: is 0 after Expected
| > ==6670==
| > ==6670== Invalid write of size 4
| > ==6670== at 0x41917E:
| > moab::ScdInterface::compute_partition_alljkbal(int,
| > int, int const*, int const*, int*, int*, int*)
| > (ScdInterface.hpp:868)
| > ==6670== by 0x46E655:
| > moab::ScdInterface::get_neighbor_alljkbal(int, int,
| > int const*, int const*, int const*, int&, int*, int*, int*)
| > (ScdInterface.cpp:758)
| > ==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int,
| > moab::ScdParData const&, int const*, int&, int*, int*, int*)
| > (ScdInterface.hpp:1219)
| > ==6670== by 0x417726: test_parallel_partition(int*, int, int)
| > (scdseq_test.cpp:1379)
| > ==6670== by 0x4174AD: test_parallel_partitions()
| > (scdseq_test.cpp:1336)
| > ==6670== by 0x40E664: run_test(void (*)(), char const*)
| > (TestUtil.hpp:320)
| > ==6670== by 0x410229: main (scdseq_test.cpp:267)
| > ==6670== Address 0x7fefff888 expected vs actual:
| > ==6670== Expected: stack array "lperiodic" of size 8 in frame 1
| > back from
| > here
| > ==6670== Actual: unknown
| > ==6670== Actual: is 0 after Expected
| > ==6670==
| > ==6670== Invalid write of size 4
| > ==6670== at 0x41917E:
| > moab::ScdInterface::compute_partition_alljkbal(int,
| > int, int const*, int const*, int*, int*, int*)
| > (ScdInterface.hpp:868)
| > ==6670== by 0x46E655:
| > moab::ScdInterface::get_neighbor_alljkbal(int, int,
| > int const*, int const*, int const*, int&, int*, int*, int*)
| > (ScdInterface.cpp:758)
| > ==6670== by 0x41B01C: moab::ScdInterface::get_neighbor(int, int,
| > moab::ScdParData const&, int const*, int&, int*, int*, int*)
| > (ScdInterface.hpp:1219)
| > ==6670== by 0x4177E8: test_parallel_partition(int*, int, int)
| > (scdseq_test.cpp:1392)
| > ==6670== by 0x4174AD: test_parallel_partitions()
| > (scdseq_test.cpp:1336)
| > ==6670== by 0x40E664: run_test(void (*)(), char const*)
| > (TestUtil.hpp:320)
| > ==6670== by 0x410229: main (scdseq_test.cpp:267)
| > ==6670== Address 0x7fefff888 expected vs actual:
| > ==6670== Expected: stack array "lperiodic" of size 8 in frame 1
| > back from
| > here
| > ==6670== Actual: unknown
| > ==6670== Actual: is 0 after Expected
| > ==6670==
| > Running test_vertex_seq ...
| > Running test_element_seq ...
| > Running test_periodic_seq ...
| >
| >
| > ________________________________
| >
| > Hmm, I guess I'd still vote for using std::vector for
| > statically-allocated
| > arrays, rather than the alternative of
| > building -C into all our debug builds. Thoughts?
| >
| > - tim
| >
| > On 09/10/2013 02:03 PM, Vijay S. Mahadevan wrote:
| >> You need to compile with -C to catch static allocation errors.
| >> That
| >> will specifically turn on range checks.
| >>
| >> Good to know that memcheck doesn't give you invalid read errors on
| >> statically allocated arrays. Look at faq:
| >> http://valgrind.org/docs/manual/faq.html
| >>
| >>> Why doesn't Memcheck find the array overruns in this program?
| >>> Unfortunately, Memcheck doesn't do bounds checking on global or
| >>> stack
| >>> arrays. We'd like to, but it's just not possible to do in a
| >>> reasonable way
| >>> that fits with how Memcheck works. Sorry.
| >>
| >>> However, the experimental tool SGcheck can detect errors like
| >>> this. Run
| >>> Valgrind with the --tool=exp-sgcheck option to try it, but be
| >>> aware that it
| >>> is not as robust as Memcheck.
| >>
| >> Vijay
| >>
| >> On Tue, Sep 10, 2013 at 1:51 PM, Tim Tautges <tautges at mcs.anl.gov>
| >> wrote:
| >>> Good catch Danqing, I didn't know that (that valgrind wouldn't
| >>> catch out
| >>> of
| >>> bounds errors on statically-allocated arrays).
| >>>
| >>> The preferred way to do this, then, will be to use std::vector,
| >>> with a
| >>> static size set at instantiation. That makes it dynamically
| >>> allocated
| >>> but
| >>> still static size. I'll remember that one.
| >>>
| >>> - tim
| >>>
| >>> On 09/10/2013 01:22 PM, Danqing Wu wrote:
| >>>>
| >>>> Here is what I found online:
| >>>>
| >>>> What Won't Valgrind Find?
| >>>> Valgrind doesn't perform bounds checking on static arrays
| >>>> (allocated on
| >>>> the stack). So if you declare an array inside your function:
| >>>>
| >>>> int main()
| >>>> {
| >>>> char x[10];
| >>>> x[11] = 'a';
| >>>> }
| >>>>
| >>>> then Valgrind won't alert you! One possible solution for testing
| >>>> purposes
| >>>> is simply to change your static arrays into dynamically
| >>>> allocated memory
| >>>> taken from the heap, where you will get bounds-checking, though
| >>>> this
| >>>> could
| >>>> be a mess of unfreed memory.
| >>>>
| >>>> ----- Original Message -----
| >>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
| >>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
| >>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
| >>>> Sent: Tuesday, September 10, 2013 1:02:45 PM
| >>>> Subject: Re: Simple code to reproduce ICC segmentation fault
| >>>>
| >>>>
| >>>>
| >>>>
| >>>> ----- Original Message -----
| >>>>
| >>>>
| >>>>
| >>>> After correcting that, moab-intel test works fine!
| >>>> Good job again, Danqing!
| >>>>
| >>>> Thanks,
| >>>> Iulian
| >>>>
| >>>> now the question is why valgrind did not find this ...
| >>>>
| >>>>
| >>>>
| >>>>
| >>>> ----- Original Message -----
| >>>>
| >>>>
| >>>> I think I found one possible reason.
| >>>>
| >>>> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom,
| >>>> const int * const gdims, const int * const gperiodic, const int
| >>>> * const
| >>>> dijk,
| >>>> int &pto, int *rdims, int *facedims, int *across_bdy)
| >>>> {
| >>>> ...
| >>>> int ldims[6], pijk[3], lperiodic[2];
| >>>> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims,
| >>>> gperiodic,
| >>>> ldims, lperiodic, pijk);
| >>>> ...
| >>>> }
| >>>>
| >>>> Here lperiodic[2] should be lperiodic[3], as the third element
| >>>> will be
| >>>> accessed inside compute_partition_alljkbal().
| >>>>
| >>>> The behaviour could be dependent on compilers. Maybe only for
| >>>> ICC 12 and
| >>>> O2, and when assert is disabled, this out of memory issue causes
| >>>> a
| >>>> segmentation fault.
| >>>>
| >>>> I will retest after this fix.
| >>>>
| >>>> ----- Original Message -----
| >>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
| >>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
| >>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
| >>>> Sent: Tuesday, September 10, 2013 10:17:28 AM
| >>>> Subject: Re: Simple code to reproduce ICC segmentation fault
| >>>>
| >>>>
| >>>> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel
| >>>> build to
| >>>> jenkins; we may have to rebuild netcdf with icc if there are
| >>>> issues with
| >>>> libcurl.
| >>>>
| >>>> Any suggestions?
| >>>>
| >>>> Iulian
| >>>> ----- Original Message -----
| >>>>
| >>>>
| >>>> On gnep, icc 12.
| >>>>
| >>>> Configure option
| >>>> ./configure --prefix=/homes/fathom/libs/current/moabintel
| >>>> --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel
| >>>> --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel
| >>>> --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc
| >>>> --enable-igeom
| >>>> --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort
| >>>>
| >>>> So the flags will include both -O2 and -DNDEBUG
| >>>>
| >>>> Here since NDEBUG is enabled, all of the assert(...) will do
| >>>> nothing,
| >>>> and
| >>>> this could make some differences.
| >>>>
| >>>> On gnep, icc 12, if only -O2, but no NDEBUG, the original test
| >>>> can pass.
| >>>> I
| >>>> guess ICC 12 would be affected by the assert stuff.
| >>>>
| >>>> ----- Original Message -----
| >>>> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
| >>>> To: "Danqing Wu" <wuda at mcs.anl.gov>
| >>>> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
| >>>> Sent: Tuesday, September 10, 2013 10:04:39 AM
| >>>> Subject: Re: Simple code to reproduce ICC segmentation fault
| >>>>
| >>>>
| >>>> so this is with icc -O2 or what are the compile options?
| >>>> Is this on gnep? icc 12? icc 13?
| >>>>
| >>>> Should we try to use ubuntu 12 for intel builds?
| >>>>
| >>>> (we can do that on jenkins auto build platform)
| >>>>
| >>>> Iulian
| >>>>
| >>>>
| >>>> ----- Original Message -----
| >>>>
| >>>>
| >>>> I am still debugging, but it seems that the two calls of
| >>>> ScdInterface::get_neighbor() caused the crash. If I comment out
| >>>> the
| >>>> second
| >>>> call, no segmentaion fault.
| >>>>
| >>>>
| >>>> #include "moab/ScdInterface.hpp"
| >>>> #include "moab/Core.hpp"
| >>>>
| >>>> #include <iostream>
| >>>>
| >>>> using namespace moab;
| >>>>
| >>>> int main()
| >>>> {
| >>>> Core moab;
| >>>> ScdInterface* scdi;
| >>>> ErrorCode rval = moab.Interface::query_interface(scdi);
| >>>>
| >>>> int gdims[] = {0, 0, 0, 48, 40, 18};
| >>>> int nprocs = 4;
| >>>> int pto = 0;
| >>>> int across_bdy_a[3] = {0};
| >>>> int rdims_a[6] = {0};
| >>>> int facedims_a[6] = {0};
| >>>>
| >>>> ScdParData spd;
| >>>> int n;
| >>>> for (n = 0; n < 6; n++)
| >>>> spd.gDims[n] = gdims[n];
| >>>> for (n = 0; n < 3; n++)
| >>>> spd.gPeriodic[n] = 0;
| >>>>
| >>>> spd.partMethod = ScdParData::ALLJKBAL;
| >>>>
| >>>> int dijka[3] = {0};
| >>>>
| >>>> dijka[0] = -1;
| >>>> dijka[1] = -1;
| >>>> dijka[2] = -1;
| >>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto,
| >>>> rdims_a,
| >>>> facedims_a, across_bdy_a);
| >>>>
| >>>> dijka[0] = 0;
| >>>> dijka[1] = -1;
| >>>> dijka[2] = -1;
| >>>> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto,
| >>>> rdims_a,
| >>>> facedims_a, across_bdy_a);
| >>>>
| >>>> std::cout << "Return from main()" << std::endl;
| >>>>
| >>>> return 0;
| >>>> }
| >>>>
| >>>>
| >>>>
| >>>>
| >>>>
| >>>
| >>> --
| >>> ================================================================
| >>> "You will keep in perfect peace him whose mind is
| >>> steadfast, because he trusts in you." Isaiah 26:3
| >>>
| >>> Tim Tautges Argonne National Laboratory
| >>> (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
| >>> phone (gvoice): (608) 354-1459 1500 Engineering Dr.
| >>> fax: (608) 263-4499 Madison, WI 53706
| >>>
| >>
| >
| > --
| > ================================================================
| > "You will keep in perfect peace him whose mind is
| > steadfast, because he trusts in you." Isaiah 26:3
| >
| > Tim Tautges Argonne National Laboratory
| > (tautges at mcs.anl.gov) (telecommuting from UW-Madison)
| > phone (gvoice): (608) 354-1459 1500 Engineering Dr.
| > fax: (608) 263-4499 Madison, WI 53706
| >
| >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/moab-dev/attachments/20130910/054d9079/attachment-0001.html>


More information about the moab-dev mailing list