[petsc-users] Strange Segmentation Violation error

Lukasz Kaczmarczyk Lukasz.Kaczmarczyk at glasgow.ac.uk
Wed Jun 7 02:22:55 CDT 2017


On 7 Jun 2017, at 07:57, TAY wee-beng <zonexo at gmail.com<mailto:zonexo at gmail.com>> wrote:


Hi,

I have been PETSc together with my CFD code. There seems to be a bug with the Intel compiler such that when I call some DM routines such as DMLocalToLocalBegin, a segmentation violation will occur if full optimization is used. I had posted this question a while back. So the current solution is to use -O1 -ip instead of -O3 -ipo -ip for certain source files which uses DMLocalToLocalBegin etc.

Recently, I made some changes to the code, mainly adding some stuffs. However, depending on my options. some cases still go thru the same program path.

Now when I tried to run those same cases, I got segmentation violation, which didn't happen before:

 IIB_I_cell_no_uvw_total2          14          10           6           3
           2           1

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org<http://valgrind.org/> on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016
[0]PETSC ERROR: ./a.out

I can't debug using VS since the codes have been optimized. I tried to print messages (if (myid == 0) print "1") to pinpoint the error. Strangely, after adding these print messages, the error disappears.

 IIB_I_cell_no_uvw_total2          14          10           6           3
           2           1
 1
 2
 3
 4
 5
    1      0.26873613      0.12620288      0.12949340      1.11422363  0.43983516E-06 -0.59311066E-01  0.25546227E+04
    2      0.22236892      0.14528589      0.16939270      1.10459102  0.74556128E-02 -0.55168234E-01  0.25532419E+04
    3      0.20764796      0.14832689      0.18780489      1.08039569  0.80299767E-02 -0.46972411E-01  0.25523174E+04

Can anyone give a logical explanation why this is happening? Moreover, if I removed printing 1 to 3, and only print 4 and 5, segmentation violation appears again.

I am using Intel Fortran 2016.1.150. I wonder if it helps if I post in the Intel Fortran forum.

I can provide more info if require.

You very likely write on the memory, for example when you exceed the size of arrays.  Depending on your compilation options, starting parameters, etc. you write in an uncontrolled way on the part of memory which belongs to your process or protected by operation system. In the second case,  you have a segmentation fault. You can have correct results for some runs, but your bug is there hiding in the dark.

To put light on it, you need Valgrind. Compile the code with debugging on, no optimisation and start searching.  You can run as well generate core file and in gdb/ldb buck track error.

Lukasz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170607/f9ccdd4d/attachment-0001.html>


More information about the petsc-users mailing list