<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi,<br>
    <br>
    I've been debugging and removing some errors. Now the code works on
    most cluster nodes but fails on 2 of them. The strange thing is that
    I'm using the same gnu compiler but only deploying to the newer
    setup nodes.<br>
    <br>
    The newer nodes work when using my old code, which is similar except
    that its domain partition is only in the z direction. The new code
    partitions in y and z direction.<br>
    <br>
    It fails at the Poisson eqn solving part. Is there anyway I can find
    out why this is happening?<br>
    <pre class="moz-signature" cols="72">Thank you.

Yours sincerely,

TAY wee-beng</pre>
    <div class="moz-cite-prefix">On 25/12/2015 10:29 PM, Matthew Knepley
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAMYG4GnkmEi_85A-E+GTvt7M9Ux=Qysfzaw2Xq-4HZh4rT9tXA@mail.gmail.com"
      type="cite">
      <div dir="ltr">It appears that you have an uninitialized variable
        (or more than one). When compiled with debugging, variables
        <div>are normally initialized to zero.</div>
        <div><br>
        </div>
        <div>  Thanks,</div>
        <div><br>
        </div>
        <div>     Matt</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Fri, Dec 25, 2015 at 5:41 AM, TAY
          wee-beng <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
            <br>
            Sorry, there seems to be some problems with my valgrind. I
            have repeated it again, with the optimized and debug version
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                <br>
                <br>
                <br>
                Thank you.<br>
                <br>
                Yours sincerely,<br>
                <br>
                TAY wee-beng<br>
                <br>
                On 25/12/2015 12:42 PM, Barry Smith wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    On Dec 24, 2015, at 10:37 PM, TAY wee-beng <<a
                      moz-do-not-send="true"
                      href="mailto:zonexo@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:zonexo@gmail.com">zonexo@gmail.com</a></a>>
                    wrote:<br>
                    <br>
                    Hi,<br>
                    <br>
                    I tried valgrind in MPI but it aborts very early,
                    with the error msg regarding PETSc initialize.<br>
                  </blockquote>
                     It shouldn't "abort" it should print some error
                  message and continue. Please send all the output when
                  running with valgrind.<br>
                  <br>
                      It is possible you are solving large enough
                  problem that require configure --with-64-bit-indices .
                  Does that resolve the problem?<br>
                  <br>
                     Barry<br>
                  <br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    I retry again, using a lower resolution.<br>
                    <br>
                    GAMG  works, but BoomerAMG and hypre doesn't.
                    Increasing cpu too high (80) also cause it to hang.
                    60 works fine.<br>
                    <br>
                    My grid size is 98x169x169<br>
                    <br>
                    But when I increase the resolution, GAMG can't work
                    again.<br>
                    <br>
                    I tried to increase the cpu no but it still doesn't
                    work.<br>
                    <br>
                    Previously, using single z direction partition, it
                    work using GAMG and hypre. So what could be the
                    problem?<br>
                    Thank you.<br>
                    <br>
                    Yours sincerely,<br>
                    <br>
                    TAY wee-beng<br>
                    <br>
                    On 25/12/2015 12:33 AM, Matthew Knepley wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      It sounds like you have memory corruption in a
                      different part of the code. Run in valgrind.<br>
                      <br>
                         Matt<br>
                      <br>
                      On Thu, Dec 24, 2015 at 10:14 AM, TAY wee-beng
                      <<a moz-do-not-send="true"
                        href="mailto:zonexo@gmail.com" target="_blank">zonexo@gmail.com</a>>
                      wrote:<br>
                      Hi,<br>
                      <br>
                      I have this strange error. I converted my CFD code
                      from a z directon only partition to the yz
                      direction partition. The code works fine but when
                      I increase the cpu no, strange things happen when
                      solving the Poisson eqn.<br>
                      <br>
                      I increase cpu no from 24 to 40.<br>
                      <br>
                      Sometimes it works, sometimes it doesn't. When it
                      doesn't, it just hangs there with no output, or it
                      gives the error below:<br>
                      <br>
                      Using MPI_Barrier during debug shows that it hangs
                      at<br>
                      <br>
                      call KSPSolve(ksp,b_rhs,xx,ierr).<br>
                      <br>
                      I use hypre BoomerAMG and GAMG
                      (-poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type
                      gamg)<br>
                      <br>
                      <br>
                      Why is this so random? Also how do I debug this
                      type of problem.<br>
                      <br>
                      <br>
                      [32]PETSC ERROR:
                      ------------------------------------------------------------------------<br>
                      [32]PETSC ERROR: Caught signal number 11 SEGV:
                      Segmentation Violation, probably memory access out
                      of range<br>
                      [32]PETSC ERROR: Try option -start_in_debugger or
                      -on_error_attach_debugger<br>
                      [32]PETSC ERROR: or see <a moz-do-not-send="true"
href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind"
                        rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br>
                      [32]PETSC ERROR: or try <a moz-do-not-send="true"
                        href="http://valgrind.org" rel="noreferrer"
                        target="_blank">http://valgrind.org</a> on
                      GNU/linux and Apple Mac OS X to find memory
                      corruption errors<br>
                      [32]PETSC ERROR: likely location of problem given
                      in stack below<br>
                      [32]PETSC ERROR: ---------------------  Stack
                      Frames ------------------------------------<br>
                      [32]PETSC ERROR: Note: The EXACT line numbers in
                      the stack are not available,<br>
                      [32]PETSC ERROR:       INSTEAD the line number of
                      the start of the function<br>
                      [32]PETSC ERROR:       is given.<br>
                      [32]PETSC ERROR: [32] HYPRE_SetupXXX line 174
                      /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c<br>
                      [32]PETSC ERROR: [32] PCSetUp_HYPRE line 122
                      /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/impls/hypre/hypre.c<br>
                      [32]PETSC ERROR: [32] PCSetUp line 945
                      /home/wtay/Codes/petsc-3.6.2/src/ksp/pc/interface/precon.c<br>
                      [32]PETSC ERROR: [32] KSPSetUp line 247
                      /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c<br>
                      [32]PETSC ERROR: [32] KSPSolve line 510
                      /home/wtay/Codes/petsc-3.6.2/src/ksp/ksp/interface/itfunc.c<br>
                      [32]PETSC ERROR: --------------------- Error
                      Message
                      --------------------------------------------------------------<br>
                      [32]PETSC ERROR: Signal received<br>
                      [32]PETSC ERROR: See <a moz-do-not-send="true"
                        href="http://www.mcs.anl.gov/petsc/documentation/faq.html"
                        rel="noreferrer" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html</a>
                      for trouble shooting.<br>
                      [32]PETSC ERROR: Petsc Release Version 3.6.2, Oct,
                      02, 2015<br>
                      [32]PETSC ERROR: ./a.out on a
                      petsc-3.6.2_shared_gnu_debug named n12-40 by wtay
                      Thu Dec 24 17:01:51 2015<br>
                      [32]PETSC ERROR: Configure options
                      --with-mpi-dir=/opt/ud/openmpi-1.8.8/
                      --download-fblaslapack=1 --with-debugging=1
                      --download-hypre=1
                      --prefix=/home/wtay/Lib/petsc-3.6.2_shared_gnu_debug
                      --known-mpi-shared=1 --with-shared-libraries
                      --with-fortran-interfaces=1<br>
                      [32]PETSC ERROR: #1 User provided function() line
                      0 in  unknown file<br>
--------------------------------------------------------------------------<br>
                      MPI_ABORT was invoked on rank 32 in communicator
                      MPI_COMM_WORLD<br>
                      with errorcode 59.<br>
                      <br>
                      -- <br>
                      Thank you.<br>
                      <br>
                      Yours sincerely,<br>
                      <br>
                      TAY wee-beng<br>
                      <br>
                      <br>
                      <br>
                      <br>
                      -- <br>
                      What most experimenters take for granted before
                      they begin their experiments is infinitely more
                      interesting than any results to which their
                      experiments lead.<br>
                      -- Norbert Wiener<br>
                    </blockquote>
                  </blockquote>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div class="gmail_signature">What most experimenters take for
          granted before they begin their experiments is infinitely more
          interesting than any results to which their experiments lead.<br>
          -- Norbert Wiener</div>
      </div>
    </blockquote>
    <br>
  </body>
</html>