[petsc-users] Strange Segmentation Violation error

Stefano Zampini stefano.zampini at gmail.com
Sat Jun 17 11:04:29 CDT 2017


If you plan to use valgrind you may want to use mpich (--download-mpich
configure option) since openmpi has a lot of false positives.

Il 17 Giu 2017 17:49, "TAY wee-beng" <zonexo at gmail.com> ha scritto:

> Hi Lukasz,
>
> Thanks for the tip.
>
> I tied using valgrind. However, I got a lot of errors at a few of
> locations. One complained of uninitialized value of :
>
> call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
>
> But I already initialize "ierr". Are these errors valid or can I hide
> them?
>
> ==
> ==17300== Conditional jump or move depends on uninitialised value(s)
> ==17300==    at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/
> libc-2.12.so)
> ==17300==    by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
> ==17300==    by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
> ==17300==    by 0xA726083: mca_mpool_hugepage_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
> ==17300==    by 0x65A83A1: mca_base_framework_components_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
> ==17300==    by 0x6614041: mca_mpool_base_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
> ==17300==    by 0x65B1EC0: mca_base_framework_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
> ==17300==    by 0x5E11123: ompi_mpi_init (in /home/tsltaywb/lib/openmpi-2.
> 1.1/lib/libmpi.so.20.10.1)
> ==17300==    by 0x5E31032: PMPI_Init (in /home/tsltaywb/lib/openmpi-2.
> 1.1/lib/libmpi.so.20.10.1)
> ==17300==    by 0x5978E87: PMPI_INIT (in /home/tsltaywb/lib/openmpi-2.
> 1.1/lib/libmpi_mpifh.so.20.11.0)
> ==17300==    by 0xB29696: petscinitialize_ (zstart.c:316)
> ==17300==    by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
> ==17300==  Uninitialised value was created by a stack allocation
> ==17300==    at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so)
> ==17300==
> ==17300== Conditional jump or move depends on uninitialised value(s)
> ==17300==    at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in /lib64/
> libc-2.12.so)
> ==17300==    by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
> ==17300==    by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
> ==17300==    by 0xA726083: mca_mpool_hugepage_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
> ==17300==    by 0x65A83A1: mca_base_framework_components_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
> ==17300==    by 0x6614041: mca_mpool_base_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
> ==17300==    by 0x65B1EC0: mca_base_framework_open (in
> /home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
> ==17300==    by 0x5E11123: ompi_mpi_init (in /home/tsltaywb/lib/openmpi-2.
> 1.1/lib/libmpi.so.20.10.1)
> ==17300==    by 0x5E31032: PMPI_Init (in /home/tsltaywb/lib/openmpi-2.
> 1.1/lib/libmpi.so.20.10.1)
> ==17300==    by 0x5978E87: PMPI_INIT (in /home/tsltaywb/lib/openmpi-2.
> 1.1/lib/libmpi_mpifh.so.20.11.0)
> ==17300==    by 0xB29696: petscinitialize_ (zstart.c:316)
> ==17300==    by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
>
>
>
>
>
> Thank you very much.
>
> Yours sincerely,
>
> ================================================
> TAY Wee-Beng (Zheng Weiming) 郑伟明
> Personal research webpage: http://tayweebeng.wixsite.com/website
> Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
> linkedin: www.linkedin.com/in/tay-weebeng
> ================================================
>
> On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk wrote:
>
>
> On 7 Jun 2017, at 07:57, TAY wee-beng <zonexo at gmail.com> wrote:
>
> Hi,
>
> I have been PETSc together with my CFD code. There seems to be a bug with
> the Intel compiler such that when I call some DM routines such as
> DMLocalToLocalBegin, a segmentation violation will occur if full
> optimization is used. I had posted this question a while back. So the
> current solution is to use -O1 -ip instead of -O3 -ipo -ip for certain
> source files which uses DMLocalToLocalBegin etc.
>
> Recently, I made some changes to the code, mainly adding some stuffs.
> However, depending on my options. some cases still go thru the same program
> path.
>
> Now when I tried to run those same cases, I got segmentation violation,
> which didn't happen before:
>
> * IIB_I_cell_no_uvw_total2          14          10           6           3*
> *           2           1*
>
> *[0]PETSC ERROR:
> ------------------------------------------------------------------------*
> *[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range*
> *[0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger*
> *[0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>*
> *[0]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on
> GNU/linux and Apple Mac OS X to find memory corruption errors*
> *[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
> and run *
> *[0]PETSC ERROR: to get more information on the crash.*
> *[0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------*
> *[0]PETSC ERROR: Signal received*
> *[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> <http://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.*
> *[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 *
> *[0]PETSC ERROR: ./a.out  *
>
>
> I can't debug using VS since the codes have been optimized. I tried to
> print messages (if (myid == 0) print "1") to pinpoint the error. Strangely,
> after adding these print messages, the error disappears.
>
> * IIB_I_cell_no_uvw_total2          14          10           6           3*
> *           2           1*
> * 1*
> * 2*
> * 3*
> * 4*
> * 5*
> *    1      0.26873613      0.12620288      0.12949340      1.11422363
> 0.43983516E-06 -0.59311066E-01  0.25546227E+04*
> *    2      0.22236892      0.14528589      0.16939270      1.10459102
> 0.74556128E-02 -0.55168234E-01  0.25532419E+04*
> *    3      0.20764796      0.14832689      0.18780489      1.08039569
> 0.80299767E-02 -0.46972411E-01  0.25523174E+04*
>
> Can anyone give a logical explanation why this is happening? Moreover, if
> I removed printing 1 to 3, and only print 4 and 5, segmentation violation
> appears again.
>
> I am using Intel Fortran 2016.1.150. I wonder if it helps if I post in the
> Intel Fortran forum.
>
> I can provide more info if require.
>
> You very likely write on the memory, for example when you exceed the size
> of arrays.  Depending on your compilation options, starting parameters,
> etc. you write in an uncontrolled way on the part of memory which belongs
> to your process or protected by operation system. In the second case,  you
> have a segmentation fault. You can have correct results for some runs, but
> your bug is there hiding in the dark.
>
> To put light on it, you need Valgrind. Compile the code with debugging on,
> no optimisation and start searching.  You can run as well generate core
> file and in gdb/ldb buck track error.
>
> Lukasz
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170617/4b460d0e/attachment-0001.html>


More information about the petsc-users mailing list