[MOAB-dev] Simple code to reproduce ICC segmentation fault

Tim Tautges tautges at mcs.anl.gov
Tue Sep 10 13:51:57 CDT 2013


Good catch Danqing, I didn't know that (that valgrind wouldn't catch out of bounds errors on statically-allocated arrays).

The preferred way to do this, then, will be to use std::vector, with a static size set at instantiation.  That makes it 
dynamically allocated but still static size.  I'll remember that one.

- tim

On 09/10/2013 01:22 PM, Danqing Wu wrote:
> Here is what I found online:
>
> What Won't Valgrind Find?
> Valgrind doesn't perform bounds checking on static arrays (allocated on the stack). So if you declare an array inside your function:
>
> int main()
> {
>      char x[10];
>      x[11] = 'a';
> }
>
> then Valgrind won't alert you! One possible solution for testing purposes is simply to change your static arrays into dynamically allocated memory taken from the heap, where you will get bounds-checking, though this could be a mess of unfreed memory.
>
> ----- Original Message -----
> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
> To: "Danqing Wu" <wuda at mcs.anl.gov>
> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
> Sent: Tuesday, September 10, 2013 1:02:45 PM
> Subject: Re: Simple code to reproduce ICC segmentation fault
>
>
>
>
> ----- Original Message -----
>
>
>
> After correcting that, moab-intel test works fine!
> Good job again, Danqing!
>
> Thanks,
> Iulian
>
> now the question is why valgrind did not find this ...
>
>
>
>
> ----- Original Message -----
>
>
> I think I found one possible reason.
>
> ErrorCode ScdInterface::get_neighbor_alljkbal(int np, int pfrom,
> const int * const gdims, const int * const gperiodic, const int * const dijk,
> int &pto, int *rdims, int *facedims, int *across_bdy)
> {
> ...
> int ldims[6], pijk[3], lperiodic[2];
> ErrorCode rval = compute_partition_alljkbal(np, pfrom, gdims, gperiodic,
> ldims, lperiodic, pijk);
> ...
> }
>
> Here lperiodic[2] should be lperiodic[3], as the third element will be accessed inside compute_partition_alljkbal().
>
> The behaviour could be dependent on compilers. Maybe only for ICC 12 and O2, and when assert is disabled, this out of memory issue causes a segmentation fault.
>
> I will retest after this fix.
>
> ----- Original Message -----
> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
> To: "Danqing Wu" <wuda at mcs.anl.gov>
> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
> Sent: Tuesday, September 10, 2013 10:17:28 AM
> Subject: Re: Simple code to reproduce ICC segmentation fault
>
>
> If it works on icc 13 / ubuntu 12, I suggest moving moab-intel build to jenkins; we may have to rebuild netcdf with icc if there are issues with libcurl.
>
> Any suggestions?
>
> Iulian
> ----- Original Message -----
>
>
> On gnep, icc 12.
>
> Configure option
> ./configure --prefix=/homes/fathom/libs/current/moabintel --with-netcdf=/homes/fathom/3rdparty/netcdf-4.1.3-intel --with-hdf5=/homes/fathom/3rdparty/hdf5-1.8.8-ser-intel --with-zlib=/homes/fathom/3rdparty/zlib/zlib-1.2.4/gcc --enable-igeom --enable-imesh CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort
>
> So the flags will include both -O2 and -DNDEBUG
>
> Here since NDEBUG is enabled, all of the assert(...) will do nothing, and this could make some differences.
>
> On gnep, icc 12, if only -O2, but no NDEBUG, the original test can pass. I guess ICC 12 would be affected by the assert stuff.
>
> ----- Original Message -----
> From: "Iulian Grindeanu" <iulian at mcs.anl.gov>
> To: "Danqing Wu" <wuda at mcs.anl.gov>
> Cc: "Tim Tautges" <tautges at mcs.anl.gov>
> Sent: Tuesday, September 10, 2013 10:04:39 AM
> Subject: Re: Simple code to reproduce ICC segmentation fault
>
>
> so this is with icc -O2 or what are the compile options?
> Is this on gnep? icc 12? icc 13?
>
> Should we try to use ubuntu 12 for intel builds?
>
> (we can do that on jenkins auto build platform)
>
> Iulian
>
>
> ----- Original Message -----
>
>
> I am still debugging, but it seems that the two calls of ScdInterface::get_neighbor() caused the crash. If I comment out the second call, no segmentaion fault.
>
>
> #include "moab/ScdInterface.hpp"
> #include "moab/Core.hpp"
>
> #include <iostream>
>
> using namespace moab;
>
> int main()
> {
> Core moab;
> ScdInterface* scdi;
> ErrorCode rval = moab.Interface::query_interface(scdi);
>
> int gdims[] = {0, 0, 0, 48, 40, 18};
> int nprocs = 4;
> int pto = 0;
> int across_bdy_a[3] = {0};
> int rdims_a[6] = {0};
> int facedims_a[6] = {0};
>
> ScdParData spd;
> int n;
> for (n = 0; n < 6; n++)
> spd.gDims[n] = gdims[n];
> for (n = 0; n < 3; n++)
> spd.gPeriodic[n] = 0;
>
> spd.partMethod = ScdParData::ALLJKBAL;
>
> int dijka[3] = {0};
>
> dijka[0] = -1;
> dijka[1] = -1;
> dijka[2] = -1;
> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a, facedims_a, across_bdy_a);
>
> dijka[0] = 0;
> dijka[1] = -1;
> dijka[2] = -1;
> rval = ScdInterface::get_neighbor(nprocs, 0, spd, dijka, pto, rdims_a, facedims_a, across_bdy_a);
>
> std::cout << "Return from main()" << std::endl;
>
> return 0;
> }
>
>
>
>
>

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
  phone (gvoice): (608) 354-1459      1500 Engineering Dr.
             fax: (608) 263-4499      Madison, WI 53706



More information about the moab-dev mailing list