[petsc-users] Strange Segmentation Violation error

Sat Jun 17 10:49:23 CDT 2017

Hi Lukasz,

Thanks for the tip.

I tied using valgrind. However, I got a lot of errors at a few of 
locations. One complained of uninitialized value of :

call PetscInitialize(PETSC_NULL_CHARACTER,ierr)

But I already initialize "ierr". Are these errors valid or can I hide them?

==
==17300== Conditional jump or move depends on uninitialised value(s)
==17300==    at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in 
/lib64/libc-2.12.so)
==17300==    by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
==17300==    by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
==17300==    by 0xA726083: mca_mpool_hugepage_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
==17300==    by 0x65A83A1: mca_base_framework_components_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300==    by 0x6614041: mca_mpool_base_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300==    by 0x65B1EC0: mca_base_framework_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300==    by 0x5E11123: ompi_mpi_init (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300==    by 0x5E31032: PMPI_Init (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300==    by 0x5978E87: PMPI_INIT (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)
==17300==    by 0xB29696: petscinitialize_ (zstart.c:316)
==17300==    by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
==17300==  Uninitialised value was created by a stack allocation
==17300==    at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so)
==17300==
==17300== Conditional jump or move depends on uninitialised value(s)
==17300==    at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in 
/lib64/libc-2.12.so)
==17300==    by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
==17300==    by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
==17300==    by 0xA726083: mca_mpool_hugepage_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
==17300==    by 0x65A83A1: mca_base_framework_components_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300==    by 0x6614041: mca_mpool_base_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300==    by 0x65B1EC0: mca_base_framework_open (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300==    by 0x5E11123: ompi_mpi_init (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300==    by 0x5E31032: PMPI_Init (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300==    by 0x5978E87: PMPI_INIT (in 
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)
==17300==    by 0xB29696: petscinitialize_ (zstart.c:316)
==17300==    by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)

Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng (Zheng Weiming) 郑伟明
Personal research webpage:http://tayweebeng.wixsite.com/website
Youtube research showcase:https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
linkedin:www.linkedin.com/in/tay-weebeng
================================================

On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk wrote:
>
>> On 7 Jun 2017, at 07:57, TAY wee-beng <zonexo at gmail.com 
>> <mailto:zonexo at gmail.com>> wrote:
>>
>> Hi,
>>
>> I have been PETSc together with my CFD code. There seems to be a bug 
>> with the Intel compiler such that when I call some DM routines such 
>> as DMLocalToLocalBegin, a segmentation violation will occur if full 
>> optimization is used. I had posted this question a while back. So the 
>> current solution is to use -O1 -ip instead of -O3 -ipo -ip for 
>> certain source files which uses DMLocalToLocalBegin etc.
>>
>> Recently, I made some changes to the code, mainly adding some stuffs. 
>> However, depending on my options. some cases still go thru the same 
>> program path.
>>
>> Now when I tried to run those same cases, I got segmentation 
>> violation, which didn't happen before:
>>
>> / IIB_I_cell_no_uvw_total2 14          10           6           3//
>> //           2           1/
>>
>> /[0]PETSC ERROR: 
>> ------------------------------------------------------------------------//
>> //[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation 
>> Violation, probably memory access out of range//
>> //[0]PETSC ERROR: Try option -start_in_debugger or 
>> -on_error_attach_debugger//
>> //[0]PETSC ERROR: or see 
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind//
>> //[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple 
>> Mac OS X to find memory corruption errors//
>> //[0]PETSC ERROR: configure using --with-debugging=yes, recompile, 
>> link, and run //
>> //[0]PETSC ERROR: to get more information on the crash.//
>> //[0]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------//
>> //[0]PETSC ERROR: Signal received//
>> //[0]PETSC ERROR: See 
>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble 
>> shooting.//
>> //[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 //
>> //[0]PETSC ERROR: ./a.out /
>>
>> I can't debug using VS since the codes have been optimized. I tried 
>> to print messages (if (myid == 0) print "1") to pinpoint the error. 
>> Strangely, after adding these print messages, the error disappears.
>>
>> / IIB_I_cell_no_uvw_total2 14          10           6           3//
>> //           2           1//
>> // 1//
>> // 2//
>> // 3//
>> // 4//
>> // 5//
>> //    1      0.26873613 0.12620288      0.12949340      1.11422363 
>> 0.43983516E-06 -0.59311066E-01  0.25546227E+04//
>> //    2      0.22236892 0.14528589      0.16939270      1.10459102 
>> 0.74556128E-02 -0.55168234E-01  0.25532419E+04//
>> //    3      0.20764796 0.14832689      0.18780489      1.08039569 
>> 0.80299767E-02 -0.46972411E-01  0.25523174E+04/
>>
>> Can anyone give a logical explanation why this is happening? 
>> Moreover, if I removed printing 1 to 3, and only print 4 and 5, 
>> segmentation violation appears again.
>>
>> I am using Intel Fortran 2016.1.150. I wonder if it helps if I post 
>> in the Intel Fortran forum.
>>
>> I can provide more info if require.
>>
> You very likely write on the memory, for example when you exceed the 
> size of arrays.  Depending on your compilation options, starting 
> parameters, etc. you write in an uncontrolled way on the part of 
> memory which belongs to your process or protected by operation system. 
> In the second case,  you have a segmentation fault. You can have 
> correct results for some runs, but your bug is there hiding in the dark.
>
> To put light on it, you need Valgrind. Compile the code with debugging 
> on, no optimisation and start searching.  You can run as well generate 
> core file and in gdb/ldb buck track error.
>
> Lukasz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170617/2d07d46f/attachment.html>