[petsc-users] Strange Segmentation Violation error
TAY wee-beng
zonexo at gmail.com
Sat Jun 17 10:49:23 CDT 2017
Hi Lukasz,
Thanks for the tip.
I tied using valgrind. However, I got a lot of errors at a few of
locations. One complained of uninitialized value of :
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
But I already initialize "ierr". Are these errors valid or can I hide them?
==
==17300== Conditional jump or move depends on uninitialised value(s)
==17300== at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in
/lib64/libc-2.12.so)
==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
==17300== by 0xA726083: mca_mpool_hugepage_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
==17300== by 0x65A83A1: mca_base_framework_components_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x6614041: mca_mpool_base_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x65B1EC0: mca_base_framework_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x5E11123: ompi_mpi_init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5E31032: PMPI_Init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5978E87: PMPI_INIT (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)
==17300== by 0xB29696: petscinitialize_ (zstart.c:316)
==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
==17300== Uninitialised value was created by a stack allocation
==17300== at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so)
==17300==
==17300== Conditional jump or move depends on uninitialised value(s)
==17300== at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in
/lib64/libc-2.12.so)
==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
==17300== by 0xA726083: mca_mpool_hugepage_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
==17300== by 0x65A83A1: mca_base_framework_components_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x6614041: mca_mpool_base_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x65B1EC0: mca_base_framework_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x5E11123: ompi_mpi_init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5E31032: PMPI_Init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5978E87: PMPI_INIT (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)
==17300== by 0xB29696: petscinitialize_ (zstart.c:316)
==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
Thank you very much.
Yours sincerely,
================================================
TAY Wee-Beng (Zheng Weiming) 郑伟明
Personal research webpage:http://tayweebeng.wixsite.com/website
Youtube research showcase:https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
linkedin:www.linkedin.com/in/tay-weebeng
================================================
On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk wrote:
>
>> On 7 Jun 2017, at 07:57, TAY wee-beng <zonexo at gmail.com
>> <mailto:zonexo at gmail.com>> wrote:
>>
>> Hi,
>>
>> I have been PETSc together with my CFD code. There seems to be a bug
>> with the Intel compiler such that when I call some DM routines such
>> as DMLocalToLocalBegin, a segmentation violation will occur if full
>> optimization is used. I had posted this question a while back. So the
>> current solution is to use -O1 -ip instead of -O3 -ipo -ip for
>> certain source files which uses DMLocalToLocalBegin etc.
>>
>> Recently, I made some changes to the code, mainly adding some stuffs.
>> However, depending on my options. some cases still go thru the same
>> program path.
>>
>> Now when I tried to run those same cases, I got segmentation
>> violation, which didn't happen before:
>>
>> / IIB_I_cell_no_uvw_total2 14 10 6 3//
>> // 2 1/
>>
>> /[0]PETSC ERROR:
>> ------------------------------------------------------------------------//
>> //[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>> Violation, probably memory access out of range//
>> //[0]PETSC ERROR: Try option -start_in_debugger or
>> -on_error_attach_debugger//
>> //[0]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind//
>> //[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
>> Mac OS X to find memory corruption errors//
>> //[0]PETSC ERROR: configure using --with-debugging=yes, recompile,
>> link, and run //
>> //[0]PETSC ERROR: to get more information on the crash.//
>> //[0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------//
>> //[0]PETSC ERROR: Signal received//
>> //[0]PETSC ERROR: See
>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>> shooting.//
>> //[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 //
>> //[0]PETSC ERROR: ./a.out /
>>
>> I can't debug using VS since the codes have been optimized. I tried
>> to print messages (if (myid == 0) print "1") to pinpoint the error.
>> Strangely, after adding these print messages, the error disappears.
>>
>> / IIB_I_cell_no_uvw_total2 14 10 6 3//
>> // 2 1//
>> // 1//
>> // 2//
>> // 3//
>> // 4//
>> // 5//
>> // 1 0.26873613 0.12620288 0.12949340 1.11422363
>> 0.43983516E-06 -0.59311066E-01 0.25546227E+04//
>> // 2 0.22236892 0.14528589 0.16939270 1.10459102
>> 0.74556128E-02 -0.55168234E-01 0.25532419E+04//
>> // 3 0.20764796 0.14832689 0.18780489 1.08039569
>> 0.80299767E-02 -0.46972411E-01 0.25523174E+04/
>>
>> Can anyone give a logical explanation why this is happening?
>> Moreover, if I removed printing 1 to 3, and only print 4 and 5,
>> segmentation violation appears again.
>>
>> I am using Intel Fortran 2016.1.150. I wonder if it helps if I post
>> in the Intel Fortran forum.
>>
>> I can provide more info if require.
>>
> You very likely write on the memory, for example when you exceed the
> size of arrays. Depending on your compilation options, starting
> parameters, etc. you write in an uncontrolled way on the part of
> memory which belongs to your process or protected by operation system.
> In the second case, you have a segmentation fault. You can have
> correct results for some runs, but your bug is there hiding in the dark.
>
> To put light on it, you need Valgrind. Compile the code with debugging
> on, no optimisation and start searching. You can run as well generate
> core file and in gdb/ldb buck track error.
>
> Lukasz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170617/2d07d46f/attachment.html>
More information about the petsc-users
mailing list