<div dir="auto">If you plan to use valgrind you may want to use mpich (--download-mpich configure option) since openmpi has a lot of false positives.</div><div class="gmail_extra"><br><div class="gmail_quote">Il 17 Giu 2017 17:49, "TAY wee-beng" <<a href="mailto:zonexo@gmail.com">zonexo@gmail.com</a>> ha scritto:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <p>Hi Lukasz,</p>
    <p>Thanks for the tip.</p>
    <p>I tied using valgrind. However, I got a lot of errors at a few of
      locations. One complained of uninitialized value of :</p>
    <p>call PetscInitialize(PETSC_NULL_<wbr>CHARACTER,ierr)</p>
    <p>But I already initialize "ierr". Are these errors valid or can I
      hide them? <br>
    </p>
    <p>== <br>
      ==17300== Conditional jump or move depends on uninitialised
      value(s)<br>
      ==17300==    at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in
      /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300==    by 0x3C2A866D95: __fopen_internal (in
      /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300==    by 0x3C2A8E2CB3: setmntent (in /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300==    by 0xA726083: mca_mpool_hugepage_open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/openmpi/mca_mpool_<wbr>hugepage.so)<br>
      ==17300==    by 0x65A83A1: mca_base_framework_components_<wbr>open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libopen-pal.so.20.10.<wbr>1)<br>
      ==17300==    by 0x6614041: mca_mpool_base_open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libopen-pal.so.20.10.<wbr>1)<br>
      ==17300==    by 0x65B1EC0: mca_base_framework_open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libopen-pal.so.20.10.<wbr>1)<br>
      ==17300==    by 0x5E11123: ompi_mpi_init (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libmpi.so.20.10.1)<br>
      ==17300==    by 0x5E31032: PMPI_Init (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libmpi.so.20.10.1)<br>
      ==17300==    by 0x5978E87: PMPI_INIT (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libmpi_mpifh.so.20.11.<wbr>0)<br>
      ==17300==    by 0xB29696: petscinitialize_ (zstart.c:316)<br>
      ==17300==    by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)<br>
      ==17300==  Uninitialised value was created by a stack allocation<br>
      ==17300==    at 0x3C2A8E2C82: setmntent (in /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300== <br>
      ==17300== Conditional jump or move depends on uninitialised
      value(s)<br>
      ==17300==    at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in
      /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300==    by 0x3C2A866D95: __fopen_internal (in
      /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300==    by 0x3C2A8E2CB3: setmntent (in /lib64/<a href="http://libc-2.12.so" target="_blank">libc-2.12.so</a>)<br>
      ==17300==    by 0xA726083: mca_mpool_hugepage_open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/openmpi/mca_mpool_<wbr>hugepage.so)<br>
      ==17300==    by 0x65A83A1: mca_base_framework_components_<wbr>open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libopen-pal.so.20.10.<wbr>1)<br>
      ==17300==    by 0x6614041: mca_mpool_base_open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libopen-pal.so.20.10.<wbr>1)<br>
      ==17300==    by 0x65B1EC0: mca_base_framework_open (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libopen-pal.so.20.10.<wbr>1)<br>
      ==17300==    by 0x5E11123: ompi_mpi_init (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libmpi.so.20.10.1)<br>
      ==17300==    by 0x5E31032: PMPI_Init (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libmpi.so.20.10.1)<br>
      ==17300==    by 0x5978E87: PMPI_INIT (in
      /home/tsltaywb/lib/openmpi-2.<wbr>1.1/lib/libmpi_mpifh.so.20.11.<wbr>0)<br>
      ==17300==    by 0xB29696: petscinitialize_ (zstart.c:316)<br>
      ==17300==    by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)</p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p><br>
    </p>
    <pre class="m_-7225536881309098629moz-signature" cols="72">Thank you very much.

Yours sincerely,

==============================<wbr>==================
TAY Wee-Beng (Zheng Weiming) 郑伟明
Personal research webpage: <a class="m_-7225536881309098629moz-txt-link-freetext" href="http://tayweebeng.wixsite.com/website" target="_blank">http://tayweebeng.wixsite.com/<wbr>website</a>
Youtube research showcase: <a class="m_-7225536881309098629moz-txt-link-freetext" href="https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA" target="_blank">https://www.youtube.com/<wbr>channel/<wbr>UC72ZHtvQNMpNs2uRTSToiLA</a>
linkedin: <a class="m_-7225536881309098629moz-txt-link-abbreviated" href="http://www.linkedin.com/in/tay-weebeng" target="_blank">www.linkedin.com/in/tay-<wbr>weebeng</a>
==============================<wbr>==================</pre>
    <div class="m_-7225536881309098629moz-cite-prefix">On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <br>
      <div>
        <blockquote type="cite">
          <div>On 7 Jun 2017, at 07:57, TAY wee-beng <<a href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>> wrote:</div>
          <br class="m_-7225536881309098629Apple-interchange-newline">
          <div>
            <div text="#000000" bgcolor="#FFFFFF">
              <p>Hi,</p>
              <p>I have been PETSc together with my CFD code.
                There seems to be a bug with the Intel compiler such
                that when I call some DM routines such as
                DMLocalToLocalBegin, a segmentation violation will occur
                if full optimization is used. I had posted this question
                a while back. So the current solution is to use -O1 -ip
                instead of -O3 -ipo -ip for certain source files which
                uses DMLocalToLocalBegin etc.<br>
              </p>
              <p>Recently, I made some changes to the code,
                mainly adding some stuffs. However, depending on my
                options. some cases still go thru the same program path.</p>
              <p>Now when I tried to run those same cases, I
                got segmentation violation, which didn't happen before:</p>
              <p><i> IIB_I_cell_no_uvw_total2     <wbr>    
                  14          10           6           3</i><i><br>
                </i><i>           2           1</i></p>
              <p><i>[0]PETSC ERROR:
                  ------------------------------<wbr>------------------------------<wbr>------------</i><i><br>
                </i><i>[0]PETSC ERROR: Caught signal number 11
                  SEGV: Segmentation Violation, probably memory access
                  out of range</i><i><br>
                </i><i>[0]PETSC ERROR: Try option
                  -start_in_debugger or -on_error_attach_debugger</i><i><br>
                </i><i>[0]PETSC ERROR: or see <a class="m_-7225536881309098629moz-txt-link-freetext" href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" target="_blank">
http://www.mcs.anl.gov/petsc/<wbr>documentation/faq.html#<wbr>valgrind</a></i><i><br>
                </i><i>[0]PETSC ERROR: or try <a class="m_-7225536881309098629moz-txt-link-freetext" href="http://valgrind.org/" target="_blank">
                    http://valgrind.org</a> on GNU/linux and Apple Mac
                  OS X to find memory corruption errors</i><i><br>
                </i><i>[0]PETSC ERROR: configure using
                  --with-debugging=yes, recompile, link, and run </i><i><br>
                </i><i>[0]PETSC ERROR: to get more information
                  on the crash.</i><i><br>
                </i><i>[0]PETSC ERROR: ---------------------
                  Error Message
                  ------------------------------<wbr>------------------------------<wbr>--</i><i><br>
                </i><i>[0]PETSC ERROR: Signal received</i><i><br>
                </i><i>[0]PETSC ERROR: See <a class="m_-7225536881309098629moz-txt-link-freetext" href="http://www.mcs.anl.gov/petsc/documentation/faq.html" target="_blank">
                    http://www.mcs.anl.gov/petsc/<wbr>documentation/faq.html</a>
                  for trouble shooting.</i><i><br>
                </i><i>[0]PETSC ERROR: Petsc Release Version
                  3.7.4, Oct, 02, 2016 </i><i><br>
                </i><i>[0]PETSC ERROR: ./a.out  </i>                     <wbr>                              <wbr>               
                <br>
              </p>
              <p>I can't debug using VS since the codes have
                been optimized. I tried to print messages (if (myid ==
                0) print "1") to pinpoint the error. Strangely, after
                adding these print messages, the error disappears.</p>
              <p><i> IIB_I_cell_no_uvw_total2     <wbr>    
                  14          10           6           3</i><i><br>
                </i><i>           2           1</i><i><br>
                </i><i> 1</i><i><br>
                </i><i> 2</i><i><br>
                </i><i> 3</i><i><br>
                </i><i> 4</i><i><br>
                </i><i> 5</i><i><br>
                </i><i>    1      0.26873613     
                  0.12620288      0.12949340      1.11422363 
                  0.43983516E-06 -0.59311066E-01  0.25546227E+04</i><i><br>
                </i><i>    2      0.22236892     
                  0.14528589      0.16939270      1.10459102 
                  0.74556128E-02 -0.55168234E-01  0.25532419E+04</i><i><br>
                </i><i>    3      0.20764796     
                  0.14832689      0.18780489      1.08039569 
                  0.80299767E-02 -0.46972411E-01  0.25523174E+04</i></p>
              <p>Can anyone give a logical explanation why this
                is happening? Moreover, if I removed printing 1 to 3,
                and only print 4 and 5, segmentation violation appears
                again.</p>
              <p>I am using Intel Fortran 2016.1.150. I wonder
                if it helps if I post in the Intel Fortran forum.</p>
              <p>I can provide more info if require.<br>
              </p>
            </div>
          </div>
        </blockquote>
      </div>
      <div>You very likely write on the memory, for example
        when you exceed the size of arrays.  Depending on your
        compilation options, starting parameters, etc. you write in an
        uncontrolled way on the part of memory which belongs to your
        process or protected by operation system. In the second case,
         you have a segmentation fault. You can have correct results for
        some runs, but your bug is there hiding in the dark.</div>
      <div><br>
      </div>
      <div>To put light on it, you need Valgrind. Compile the
        code with debugging on, no optimisation and start searching.
         You can run as well generate core file and in gdb/ldb buck
        track error. </div>
      <div><br>
      </div>
      <div>Lukasz</div>
    </blockquote>
    <br>
  </div>

</blockquote></div></div>