<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Thank you Sherry for your efforts</p>
    <p>but before I can setup an example that reproduces the problem, I
      have to ask PETSc related question.</p>
    <p>When I pump matrix via MatView MatLoad it ignores its original
      partitioning.<br>
    </p>
    <p>Say originally I have 100 and 110 equations on two processors,
      after MatLoad I will have 105 and 105 also on two processors.</p>
    <p>What do I do to pass partitioning info through MatView MatLoad?</p>
    <p>I guess it's important for reproducing my setup exactly.</p>
    <p>Thanks<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 10/19/2016 08:06 AM, Xiaoye S. Li
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAFvbobWHxhgp1Lan4zf8t-O_D5_LO89Jc1VgbQ4JkMOrxoEz2Q@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div class="gmail_default" style="font-family:comic sans
          ms,sans-serif;font-size:small">I looked at each
          valgrind-complained item in your email dated Oct. 11.  Those
          reports are really superficial; I don't see anything  wrong
          with those lines (mostly uninitialized variables) singled
          out.  I did a few tests with the latest version in github,
           all went fine. </div>
        <div class="gmail_default" style="font-family:comic sans
          ms,sans-serif;font-size:small"><br>
        </div>
        <div class="gmail_default" style="font-family:comic sans
          ms,sans-serif;font-size:small">Perhaps you can print your
          matrix that caused problem, I can run it using  your matrix.</div>
        <div class="gmail_default" style="font-family:comic sans
          ms,sans-serif;font-size:small"><br>
        </div>
        <div class="gmail_default" style="font-family:comic sans
          ms,sans-serif;font-size:small">Sherry</div>
        <div class="gmail_default" style="font-family:comic sans
          ms,sans-serif;font-size:small"><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, Oct 11, 2016 at 2:18 PM, Anton
          <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:popov@uni-mainz.de" target="_blank">popov@uni-mainz.de</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
              class=""><br>
              <br>
              On 10/11/16 7:19 PM, Satish Balay wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                This log looks truncated. Are there any valgrind mesages
                before this?<br>
                [like from your application code - or from MPI]<br>
              </blockquote>
            </span>
            Yes it is indeed truncated. I only included relevant
            messages.<span class=""><br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                Perhaps you can send the complete log - with:<br>
                valgrind -q --tool=memcheck --leak-check=yes
                --num-callers=20 --track-origins=yes<br>
                <br>
                [and if there were more valgrind messages from MPI -
                rebuild petsc<br>
              </blockquote>
            </span>
            There are no messages originating from our code, just a few
            MPI related ones (probably false positives) and from
            SuperLU_DIST (most of them).<br>
            <br>
            Thanks,<br>
            Anton
            <div class="HOEnZb">
              <div class="h5"><br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  with --download-mpich - for a valgrind clean mpi]<br>
                  <br>
                  Sherry,<br>
                  Perhaps this log points to some issue in superlu_dist?<br>
                  <br>
                  thanks,<br>
                  Satish<br>
                  <br>
                  On Tue, 11 Oct 2016, Anton Popov wrote:<br>
                  <br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    Valgrind immediately detects interesting stuff:<br>
                    <br>
                    ==25673== Use of uninitialised value of size 8<br>
                    ==25673==    at 0x178272C: static_schedule
                    (static_schedule.c:960)<br>
                    ==25674== Use of uninitialised value of size 8<br>
                    ==25674==    at 0x178272C: static_schedule
                    (static_schedule.c:960)<br>
                    ==25674==    by 0x174E74E: pdgstrf (pdgstrf.c:572)<br>
                    ==25674==    by 0x1733954: pdgssvx (pdgssvx.c:1124)<br>
                    <br>
                    <br>
                    ==25673== Conditional jump or move depends on
                    uninitialised value(s)<br>
                    ==25673==    at 0x1752143: pdgstrf
                    (dlook_ahead_update.c:24)<br>
                    ==25673==    by 0x1733954: pdgssvx (pdgssvx.c:1124)<br>
                    <br>
                    <br>
                    ==25673== Conditional jump or move depends on
                    uninitialised value(s)<br>
                    ==25673==    at 0x5C83F43: PMPI_Recv (in
                    /opt/mpich3/lib/libmpi.so.12.1<wbr>.0)<br>
                    ==25673==    by 0x1755385: pdgstrf2_trsm
                    (pdgstrf2.c:253)<br>
                    ==25673==    by 0x1751E4F: pdgstrf
                    (dlook_ahead_update.c:195)<br>
                    ==25673==    by 0x1733954: pdgssvx (pdgssvx.c:1124)<br>
                    <br>
                    ==25674== Use of uninitialised value of size 8<br>
                    ==25674==    at 0x62BF72B: _itoa_word (_itoa.c:179)<br>
                    ==25674==    by 0x62C1289: printf_positional
                    (vfprintf.c:2022)<br>
                    ==25674==    by 0x62C2465: vfprintf
                    (vfprintf.c:1677)<br>
                    ==25674==    by 0x638AFD5: __vsnprintf_chk
                    (vsnprintf_chk.c:63)<br>
                    ==25674==    by 0x638AF37: __snprintf_chk
                    (snprintf_chk.c:34)<br>
                    ==25674==    by 0x5CC6C08:
                    MPIR_Err_create_code_valist (in<br>
                    /opt/mpich3/lib/libmpi.so.12.1<wbr>.0)<br>
                    ==25674==    by 0x5CC7A9A: MPIR_Err_create_code (in<br>
                    /opt/mpich3/lib/libmpi.so.12.1<wbr>.0)<br>
                    ==25674==    by 0x5C83FB1: PMPI_Recv (in
                    /opt/mpich3/lib/libmpi.so.12.1<wbr>.0)<br>
                    ==25674==    by 0x1755385: pdgstrf2_trsm
                    (pdgstrf2.c:253)<br>
                    ==25674==    by 0x1751E4F: pdgstrf
                    (dlook_ahead_update.c:195)<br>
                    ==25674==    by 0x1733954: pdgssvx (pdgssvx.c:1124)<br>
                    <br>
                    ==25674== Use of uninitialised value of size 8<br>
                    ==25674==    at 0x1751E92: pdgstrf
                    (dlook_ahead_update.c:205)<br>
                    ==25674==    by 0x1733954: pdgssvx (pdgssvx.c:1124)<br>
                    <br>
                    And it crashes after this:<br>
                    <br>
                    ==25674== Invalid write of size 4<br>
                    ==25674==    at 0x1751F2F: pdgstrf
                    (dlook_ahead_update.c:211)<br>
                    ==25674==    by 0x1733954: pdgssvx (pdgssvx.c:1124)<br>
                    ==25674==    by 0xAAEFAE:
                    MatLUFactorNumeric_SuperLU_DIS<wbr>T
                    (superlu_dist.c:421)<br>
                    ==25674==  Address 0xa0 is not stack'd, malloc'd or
                    (recently) free'd<br>
                    ==25674==<br>
                    [1]PETSC ERROR:<br>
                    ------------------------------<wbr>------------------------------<wbr>------------<br>
                    [1]PETSC ERROR: Caught signal number 11 SEGV:
                    Segmentation Violation, probably<br>
                    memory access out of range<br>
                    <br>
                    <br>
                    On 10/11/2016 03:26 PM, Anton Popov wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      On 10/10/2016 07:11 PM, Satish Balay wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        Thats from petsc-3.5<br>
                        <br>
                        Anton - please post the stack trace you get with<br>
                        --download-superlu_dist-commit<wbr>=origin/maint<br>
                      </blockquote>
                      I guess this is it:<br>
                      <br>
                      [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 421<br>
                      /home/anton/LIB/petsc/src/mat/<wbr>impls/aij/mpi/superlu_dist/sup<wbr>erlu_dist.c<br>
                      [0]PETSC ERROR: [0] MatLUFactorNumeric_SuperLU_DIS<wbr>T
                      line 282<br>
                      /home/anton/LIB/petsc/src/mat/<wbr>impls/aij/mpi/superlu_dist/sup<wbr>erlu_dist.c<br>
                      [0]PETSC ERROR: [0] MatLUFactorNumeric line 2985<br>
                      /home/anton/LIB/petsc/src/mat/<wbr>interface/matrix.c<br>
                      [0]PETSC ERROR: [0] PCSetUp_LU line 101<br>
                      /home/anton/LIB/petsc/src/ksp/<wbr>pc/impls/factor/lu/lu.c<br>
                      [0]PETSC ERROR: [0] PCSetUp line 930<br>
                      /home/anton/LIB/petsc/src/ksp/<wbr>pc/interface/precon.c<br>
                      <br>
                      According to the line numbers it crashes within<br>
                      MatLUFactorNumeric_SuperLU_DIS<wbr>T while calling
                      pdgssvx.<br>
                      <br>
                      Surprisingly this only happens on the second SNES
                      iteration, but not on the<br>
                      first.<br>
                      <br>
                      I'm trying to reproduce this behavior with PETSc
                      KSP and SNES examples.<br>
                      However, everything I've tried up to now with
                      SuperLU_DIST does just fine.<br>
                      <br>
                      I'm also checking our code in Valgrind to make
                      sure it's clean.<br>
                      <br>
                      Anton<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        Satish<br>
                        <br>
                        <br>
                        On Mon, 10 Oct 2016, Xiaoye S. Li wrote:<br>
                        <br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          Which version of superlu_dist does this
                          capture?   I looked at the<br>
                          original<br>
                          error  log, it pointed to pdgssvx: line 161. 
                          But that line is in<br>
                          comment<br>
                          block, not the program.<br>
                          <br>
                          Sherry<br>
                          <br>
                          <br>
                          On Mon, Oct 10, 2016 at 7:27 AM, Anton Popov
                          <<a moz-do-not-send="true"
                            href="mailto:popov@uni-mainz.de"
                            target="_blank">popov@uni-mainz.de</a>>
                          wrote:<br>
                          <br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">
                            On 10/07/2016 05:23 PM, Satish Balay wrote:<br>
                            <br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">
                              On Fri, 7 Oct 2016, Kong, Fande wrote:<br>
                              <br>
                              On Fri, Oct 7, 2016 at 9:04 AM, Satish
                              Balay <<a moz-do-not-send="true"
                                href="mailto:balay@mcs.anl.gov"
                                target="_blank">balay@mcs.anl.gov</a>><br>
                              wrote:<br>
                              <blockquote class="gmail_quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                On Fri, 7 Oct 2016, Anton Popov wrote:<br>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  Hi guys,<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    are there any news about fixing
                                    buggy behavior of<br>
                                    SuperLU_DIST, exactly<br>
                                    <br>
                                  </blockquote>
                                  what<br>
                                  <br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    is described here:<br>
                                    <br>
                                    <a moz-do-not-send="true"
                                      href="https://urldefense.proofpoint.com/v2/url?u=http-3A__lists"
                                      rel="noreferrer" target="_blank">https://urldefense.proofpoint.<wbr>com/v2/url?u=http-3A__lists</a>.<br>
                                    <br>
                                  </blockquote>
                                  mcs.anl.gov_pipermail_petsc-2D<wbr>users_2015-2DAugust_026802.htm<br>
                                  l&d=CwIBAg&c=<br>
                                  54IZrppPQZKX9mLzcGdPfFD1hxrcB_<wbr>_aEkJFOKJFd00&r=DUUt3SRGI0_<br>
                                  JgtNaS3udV68GRkgV4ts7XKfj2opmi<wbr>CY&m=RwruX6ckX0t9H89Z6LXKBfJBO<wbr>AM2vG<br>
                                  1sQHw2tIsSQtA&s=bbB62oGLm582Je<wbr>bVs8xsUej_OX0eUwibAKsRRWKafos&<wbr>e=
                                  ?<br>
                                  <br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    I'm using 3.7.4 and still get SEGV
                                    in pdgssvx routine.<br>
                                    Everything works<br>
                                    <br>
                                  </blockquote>
                                  fine<br>
                                  <br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    with 3.5.4.<br>
                                    <br>
                                    Do I still have to stick to maint
                                    branch, and what are the<br>
                                    chances for<br>
                                    <br>
                                  </blockquote>
                                  these<br>
                                  <br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    fixes to be included in 3.7.5?<br>
                                    <br>
                                  </blockquote>
                                  3.7.4. is off maint branch [as of a
                                  week ago]. So if you are<br>
                                  seeing<br>
                                  issues with it - its best to debug and
                                  figure out the cause.<br>
                                  <br>
                                  This bug is indeed inside of
                                  superlu_dist, and we started having<br>
                                  this<br>
                                </blockquote>
                                issue<br>
                                from PETSc-3.6.x. I think superlu_dist
                                developers should have<br>
                                fixed this<br>
                                bug. We forgot to update superlu_dist?? 
                                This is not a thing users<br>
                                could<br>
                                debug and fix.<br>
                                <br>
                                I have many people in INL suffering from
                                this issue, and they have<br>
                                to<br>
                                stay<br>
                                with PETSc-3.5.4 to use superlu_dist.<br>
                                <br>
                              </blockquote>
                              To verify if the bug is fixed in latest
                              superlu_dist - you can try<br>
                              [assuming you have git - either from
                              petsc-3.7/maint/master]:<br>
                              <br>
                              --download-superlu_dist
                              --download-superlu_dist-commit<wbr>=origin/maint<br>
                              <br>
                              <br>
                              Satish<br>
                              <br>
                              Hi Satish,<br>
                            </blockquote>
                            I did this:<br>
                            <br>
                            git clone -b maint <a
                              moz-do-not-send="true"
                              href="https://bitbucket.org/petsc/petsc.git"
                              rel="noreferrer" target="_blank">https://bitbucket.org/petsc/pe<wbr>tsc.git</a>
                            petsc<br>
                            <br>
                            --download-superlu_dist<br>
                            --download-superlu_dist-commit<wbr>=origin/maint
                            (not sure this is needed,<br>
                            since I'm already in maint)<br>
                            <br>
                            The problem is still there.<br>
                            <br>
                            Cheers,<br>
                            Anton<br>
                            <br>
                          </blockquote>
                        </blockquote>
                      </blockquote>
                    </blockquote>
                    <br>
                    <br>
                  </blockquote>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>