<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi Lukasz,</p>
<p>Thanks for the tip.</p>
<p>I tied using valgrind. However, I got a lot of errors at a few of
locations. One complained of uninitialized value of :</p>
<p>call PetscInitialize(PETSC_NULL_CHARACTER,ierr)</p>
<p>But I already initialize "ierr". Are these errors valid or can I
hide them? <br>
</p>
<p>== <br>
==17300== Conditional jump or move depends on uninitialised
value(s)<br>
==17300== at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in
/lib64/libc-2.12.so)<br>
==17300== by 0x3C2A866D95: __fopen_internal (in
/lib64/libc-2.12.so)<br>
==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)<br>
==17300== by 0xA726083: mca_mpool_hugepage_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)<br>
==17300== by 0x65A83A1: mca_base_framework_components_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)<br>
==17300== by 0x6614041: mca_mpool_base_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)<br>
==17300== by 0x65B1EC0: mca_base_framework_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)<br>
==17300== by 0x5E11123: ompi_mpi_init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)<br>
==17300== by 0x5E31032: PMPI_Init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)<br>
==17300== by 0x5978E87: PMPI_INIT (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)<br>
==17300== by 0xB29696: petscinitialize_ (zstart.c:316)<br>
==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)<br>
==17300== Uninitialised value was created by a stack allocation<br>
==17300== at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so)<br>
==17300== <br>
==17300== Conditional jump or move depends on uninitialised
value(s)<br>
==17300== at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in
/lib64/libc-2.12.so)<br>
==17300== by 0x3C2A866D95: __fopen_internal (in
/lib64/libc-2.12.so)<br>
==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)<br>
==17300== by 0xA726083: mca_mpool_hugepage_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)<br>
==17300== by 0x65A83A1: mca_base_framework_components_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)<br>
==17300== by 0x6614041: mca_mpool_base_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)<br>
==17300== by 0x65B1EC0: mca_base_framework_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)<br>
==17300== by 0x5E11123: ompi_mpi_init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)<br>
==17300== by 0x5E31032: PMPI_Init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)<br>
==17300== by 0x5978E87: PMPI_INIT (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)<br>
==17300== by 0xB29696: petscinitialize_ (zstart.c:316)<br>
==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<pre class="moz-signature" cols="72">Thank you very much.
Yours sincerely,
================================================
TAY Wee-Beng (Zheng Weiming) 郑伟明
Personal research webpage: <a class="moz-txt-link-freetext" href="http://tayweebeng.wixsite.com/website">http://tayweebeng.wixsite.com/website</a>
Youtube research showcase: <a class="moz-txt-link-freetext" href="https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA">https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA</a>
linkedin: <a class="moz-txt-link-abbreviated" href="http://www.linkedin.com/in/tay-weebeng">www.linkedin.com/in/tay-weebeng</a>
================================================</pre>
<div class="moz-cite-prefix">On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:055B880D-7D0C-49C9-84B5-2C64B003FC0A@glasgow.ac.uk">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 7 Jun 2017, at 07:57, TAY wee-beng <<a
href="mailto:zonexo@gmail.com" class=""
moz-do-not-send="true">zonexo@gmail.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class="">
<p class="">Hi,</p>
<p class="">I have been PETSc together with my CFD code.
There seems to be a bug with the Intel compiler such
that when I call some DM routines such as
DMLocalToLocalBegin, a segmentation violation will occur
if full optimization is used. I had posted this question
a while back. So the current solution is to use -O1 -ip
instead of -O3 -ipo -ip for certain source files which
uses DMLocalToLocalBegin etc.<br class="">
</p>
<p class="">Recently, I made some changes to the code,
mainly adding some stuffs. However, depending on my
options. some cases still go thru the same program path.</p>
<p class="">Now when I tried to run those same cases, I
got segmentation violation, which didn't happen before:</p>
<p class=""><i class=""> IIB_I_cell_no_uvw_total2
14 10 6 3</i><i class=""><br
class="">
</i><i class=""> 2 1</i></p>
<p class=""><i class="">[0]PETSC ERROR:
------------------------------------------------------------------------</i><i
class=""><br class="">
</i><i class="">[0]PETSC ERROR: Caught signal number 11
SEGV: Segmentation Violation, probably memory access
out of range</i><i class=""><br class="">
</i><i class="">[0]PETSC ERROR: Try option
-start_in_debugger or -on_error_attach_debugger</i><i
class=""><br class="">
</i><i class="">[0]PETSC ERROR: or see <a
class="moz-txt-link-freetext"
href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind"
moz-do-not-send="true">
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a></i><i
class=""><br class="">
</i><i class="">[0]PETSC ERROR: or try <a
class="moz-txt-link-freetext"
href="http://valgrind.org/" moz-do-not-send="true">
http://valgrind.org</a> on GNU/linux and Apple Mac
OS X to find memory corruption errors</i><i class=""><br
class="">
</i><i class="">[0]PETSC ERROR: configure using
--with-debugging=yes, recompile, link, and run </i><i
class=""><br class="">
</i><i class="">[0]PETSC ERROR: to get more information
on the crash.</i><i class=""><br class="">
</i><i class="">[0]PETSC ERROR: ---------------------
Error Message
--------------------------------------------------------------</i><i
class=""><br class="">
</i><i class="">[0]PETSC ERROR: Signal received</i><i
class=""><br class="">
</i><i class="">[0]PETSC ERROR: See <a
class="moz-txt-link-freetext"
href="http://www.mcs.anl.gov/petsc/documentation/faq.html"
moz-do-not-send="true">
http://www.mcs.anl.gov/petsc/documentation/faq.html</a>
for trouble shooting.</i><i class=""><br class="">
</i><i class="">[0]PETSC ERROR: Petsc Release Version
3.7.4, Oct, 02, 2016 </i><i class=""><br class="">
</i><i class="">[0]PETSC ERROR: ./a.out </i>
<br class="">
</p>
<p class="">I can't debug using VS since the codes have
been optimized. I tried to print messages (if (myid ==
0) print "1") to pinpoint the error. Strangely, after
adding these print messages, the error disappears.</p>
<p class=""><i class=""> IIB_I_cell_no_uvw_total2
14 10 6 3</i><i class=""><br
class="">
</i><i class=""> 2 1</i><i class=""><br
class="">
</i><i class=""> 1</i><i class=""><br class="">
</i><i class=""> 2</i><i class=""><br class="">
</i><i class=""> 3</i><i class=""><br class="">
</i><i class=""> 4</i><i class=""><br class="">
</i><i class=""> 5</i><i class=""><br class="">
</i><i class=""> 1 0.26873613
0.12620288 0.12949340 1.11422363
0.43983516E-06 -0.59311066E-01 0.25546227E+04</i><i
class=""><br class="">
</i><i class=""> 2 0.22236892
0.14528589 0.16939270 1.10459102
0.74556128E-02 -0.55168234E-01 0.25532419E+04</i><i
class=""><br class="">
</i><i class=""> 3 0.20764796
0.14832689 0.18780489 1.08039569
0.80299767E-02 -0.46972411E-01 0.25523174E+04</i></p>
<p class="">Can anyone give a logical explanation why this
is happening? Moreover, if I removed printing 1 to 3,
and only print 4 and 5, segmentation violation appears
again.</p>
<p class="">I am using Intel Fortran 2016.1.150. I wonder
if it helps if I post in the Intel Fortran forum.</p>
<p class="">I can provide more info if require.<br
class="">
</p>
</div>
</div>
</blockquote>
</div>
<div class="">You very likely write on the memory, for example
when you exceed the size of arrays. Depending on your
compilation options, starting parameters, etc. you write in an
uncontrolled way on the part of memory which belongs to your
process or protected by operation system. In the second case,
you have a segmentation fault. You can have correct results for
some runs, but your bug is there hiding in the dark.</div>
<div class=""><br class="">
</div>
<div class="">To put light on it, you need Valgrind. Compile the
code with debugging on, no optimisation and start searching.
You can run as well generate core file and in gdb/ldb buck
track error. </div>
<div class=""><br class="">
</div>
<div class="">Lukasz</div>
</blockquote>
<br>
</body>
</html>