<div dir="ltr">Matt that sound like it. <div><br></div><div>danyang, just in case its not clear, you need to delete your architecture directory and reconfigure from scratch. You should be able to just delete the arch-dir/externalpackages/git.parmetis[metis] directories but I'd simply delete the whole arch-dir.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 17, 2019 at 1:03 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Tue, Sep 17, 2019 at 12:53 PM Danyang Su <<a href="mailto:danyang.su@gmail.com" target="_blank">danyang.su@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <p>Hi Mark,</p>
    <p>Thanks for your follow-up. <br>
    </p>
    <p>The unstructured grid code has been verified and there is no
      problem in the results. The convergence rate is also good. The 3D
      mesh is not good, it is based on the original stratum which I
      haven't refined, but good for initial test as it is relative small
      and the results obtained from this mesh still makes sense.</p>
    <p>The 2D meshes are just for testing purpose as I want to reproduce
      the partition problem on a cluster using PETSc3.11.3 and
      Intel2019. Unfortunately, I didn't find problem using this
      example. </p>
    <p>The code has no problem in using different PETSc versions (PETSc
      V3.4 to V3.11) and MPI distribution (MPICH, OpenMPI, IntelMPI),
      except for one simulation case (the mesh I attached) on a cluster
      with PETSc3.11.3 and Intel2019u4 due to the very different
      partition compared to PETSc3.9.3. Yet the simulation results are
      the same except for the efficiency problem because the strange
      partition results into much more communication (ghost nodes).</p>
    <p>I am still trying different compiler and mpi with PETSc3.11.3 on
      that cluster to trace the problem. Will get back to you guys when
      there is update.</p>
    <p></p></div></blockquote><div>You had --download-parmetis in your configure command, but I wonder if it is possible that it actually was not downloaded and</div><div>already present. The type of the ParMetis weights can be changed, and if the type that PETSc thinks it is does not match the</div><div>actual library type, then the weights could all be crazy numbers. I seem to recall someone changing the weight type in a release,</div><div>which might mean that the built ParMetis was fine with one version and not the other.</div><div><br></div><div>  Thanks,</div><div><br></div><div>    Matt</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><p>Thanks,</p>
    <p>danyang<br>
    </p>
    <div class="gmail-m_4932417258908498662gmail-m_-7400272017397527304moz-cite-prefix">On 2019-09-17 9:02 a.m., Mark Adams
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">Danyang, <br>
        <div><br>
        </div>
        <div>Excuse me if I missed something in this thread but just a
          few ideas.</div>
        <div><br>
        </div>
        <div>First, I trust that you have verified that you are getting
          a good solution with these bad meshes. Ideally you would check
          that the solver convergence rates are similar.</div>
        <div><br>
        </div>
        <div>You might verify that your mesh is inside of DMPLex
          correctly. You can visualize a Plex mesh very easily. (let us
          know if you need instructions).</div>
        <div><br>
        </div>
        <div>This striping on the 2D meshes look something like what you
          are getting with your 3D PRISM mesh. DMPLex just calls
          Parmetis with a flat graph. It is odd to me that your
          rectangular grids have so much structure and are
          non-isotropic. I assume that these rectangular meshes are
          isotropic (eg, squares).</div>
        <div><br>
        </div>
        <div>Anyway, just some thoughts,</div>
        <div>Mark</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Sep 17, 2019 at 12:43
          AM Danyang Su via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div bgcolor="#FFFFFF">
            <p><br>
            </p>
            <div class="gmail-m_4932417258908498662gmail-m_-7400272017397527304gmail-m_7601587419187439590gmail-m_6978125811855528906moz-cite-prefix">On
              2019-09-16 12:02 p.m., Matthew Knepley wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div dir="ltr">On Mon, Sep 16, 2019 at 1:46 PM Smith,
                  Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>>
                  wrote:<br>
                </div>
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
                      Very different stuff going on in the two cases,
                    different objects being created, different number of
                    different types of operations. Clearly a major
                    refactorization of the code was done. Presumably a
                    regression was introduced that changed the behavior
                    dramatically, possible by mistake. <br>
                    <br>
                       You can attempt to use git bisect to determine
                    what changed caused the dramatic change in behavior.
                    Then it can be decided if the changed that triggered
                    the change in the results was a bug or a planned
                    feature.<br>
                  </blockquote>
                  <div><br>
                  </div>
                  <div>Danyang,</div>
                  <div><br>
                  </div>
                  <div>Can you send me the smallest mesh you care about,
                    and I will look at the partitioning? We can at least
                    get quality metrics</div>
                  <div>between these two releases.</div>
                  <div><br>
                  </div>
                  <div>  Thanks,</div>
                  <div><br>
                  </div>
                  <div>     Matt</div>
                </div>
              </div>
            </blockquote>
            <p>Hi Matt, <br>
            </p>
            <p>This is the smallest mesh for the regional scale
              simulation that has strange partition problem. It can be
              download via the link below.<br>
            </p>
            <p><a class="gmail-m_4932417258908498662gmail-m_-7400272017397527304gmail-m_7601587419187439590gmail-m_6978125811855528906moz-txt-link-freetext" href="https://www.dropbox.com/s/tu34jgqqhkz8pwj/basin-3d.vtk?dl=0" target="_blank">https://www.dropbox.com/s/tu34jgqqhkz8pwj/basin-3d.vtk?dl=0</a></p>
            <p>I am trying to reproduce the similar problem using
              smaller 2D mesh, however, there is no such problem in 2D,
              even though the partitions using PETSc 3.9.3 and 3.11.3
              are a bit different, they both look reasonable. As shown
              below, both rectangular mesh and triangular mesh use
              DMPlex.<br>
            </p>
            <p><img src="cid:16d402ae935282379f1" alt="2D
                rectangular and triangle mesh" width="1134" height="780"></p>
            <p>I will keep on testing using PETSc3.11.3 but with
              different compiler and MPI to check if I can reproduce the
              problem.</p>
            <p>Thanks,</p>
            <p>Danyang<br>
            </p>
            <blockquote type="cite">
              <div dir="ltr">
                <div class="gmail_quote">
                  <div> <br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">    Barry<br>
                    <br>
                    <br>
                    > On Sep 16, 2019, at 11:50 AM, Danyang Su <<a href="mailto:danyang.su@gmail.com" target="_blank">danyang.su@gmail.com</a>>
                    wrote:<br>
                    > <br>
                    > Hi Barry and Matt,<br>
                    > <br>
                    > Attached is the output of both runs with
                    -dm_view -log_view included.<br>
                    > <br>
                    > I am now coordinating with staff to install
                    PETSc 3.9.3 version using intel2019u4 to narrow down
                    the problem. Will get back to you later after the
                    test.<br>
                    > <br>
                    > Thanks,<br>
                    > <br>
                    > Danyang<br>
                    > <br>
                    > On 2019-09-15 4:43 p.m., Smith, Barry F. wrote:<br>
                    >>   Send the configure.log and make.log for
                    the two system configurations that produce very
                    different results as well as the output running with
                    -dm_view -info for both runs. The cause is likely
                    not subtle, one is likely using metis and the other
                    is likely just not using any partitioner.<br>
                    >> <br>
                    >> <br>
                    >> <br>
                    >>> On Sep 15, 2019, at 6:07 PM, Matthew
                    Knepley via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>
                    wrote:<br>
                    >>> <br>
                    >>> On Sun, Sep 15, 2019 at 6:59 PM Danyang
                    Su <<a href="mailto:danyang.su@gmail.com" target="_blank">danyang.su@gmail.com</a>>
                    wrote:<br>
                    >>> Hi Matt,<br>
                    >>> <br>
                    >>> Thanks for the quick reply. I have no
                    change in the adjacency. The source code and the
                    simulation input files are all the same. I also
                    tried to use GNU compiler and mpich with petsc
                    3.11.3 and it works fine.<br>
                    >>> <br>
                    >>> It looks like the problem is caused by
                    the difference in configuration. However, the
                    configuration is pretty the same as petsc 3.9.3
                    except the compiler and mpi used. I will contact
                    scinet staff to check if they have any idea on this.<br>
                    >>> <br>
                    >>> Very very strange since the partition
                    is handled completely by Metis, and does not use
                    MPI.<br>
                    >>> <br>
                    >>>   Thanks,<br>
                    >>> <br>
                    >>>     Matt<br>
                    >>>  Thanks,<br>
                    >>> <br>
                    >>> Danyang<br>
                    >>> <br>
                    >>> On September 15, 2019 3:20:18 p.m. PDT,
                    Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>>
                    wrote:<br>
                    >>> On Sun, Sep 15, 2019 at 5:19 PM Danyang
                    Su via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>
                    wrote:<br>
                    >>> Dear All,<br>
                    >>> <br>
                    >>> I have a question regarding strange
                    partition problem in PETSc 3.11 version. The problem
                    does not exist on my local workstation. However, on
                    a cluster with different PETSc versions, the
                    partition seems quite different, as you can find in
                    the figure below, which is tested with 160
                    processors. The color means the processor owns that
                    subdomain. In this layered prism mesh, there are 40
                    layers from bottom to top and each layer has around
                    20k nodes. The natural order of nodes is also
                    layered from bottom to top.<br>
                    >>> <br>
                    >>> The left partition (PETSc 3.10 and
                    earlier) looks good with minimum number of ghost
                    nodes while the right one (PETSc 3.11) looks weired
                    with huge number of ghost nodes. Looks like the
                    right one uses partition layer by layer. This
                    problem exists on a a cluster but not on my local
                    workstation for the same PETSc version (with
                    different compiler and MPI). Other than the
                    difference in partition and efficiency, the
                    simulation results are the same.<br>
                    >>> <br>
                    >>> <br>
                    >>> <br>
                    >>> <br>
                    >>> Below is PETSc configuration on three
                    machine:<br>
                    >>> <br>
                    >>> Local workstation (works fine): 
                    ./configure --with-cc=gcc --with-cxx=g++
                    --with-fc=gfortran --download-mpich
                    --download-scalapack --download-parmetis
                    --download-metis --download-ptscotch
                    --download-fblaslapack --download-hypre
                    --download-superlu_dist --download-hdf5=yes
                    --download-ctetgen --with-debugging=0 COPTFLAGS=-O3
                    CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
                    --with-cxx-dialect=C++11<br>
                    >>> <br>
                    >>> Cluster with PETSc 3.9.3 (works fine):
--prefix=/scinet/niagara/software/2018a/opt/intel-2018.2-intelmpi-2018.2/petsc/3.9.3
                    CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpifc
                    COPTFLAGS="-march=native -O2"
                    CXXOPTFLAGS="-march=native -O2"
                    FOPTFLAGS="-march=native -O2" --download-chaco=1
                    --download-hypre=1 --download-metis=1
                    --download-ml=1 --download-mumps=1
                    --download-parmetis=1 --download-plapack=1
                    --download-prometheus=1 --download-ptscotch=1
                    --download-scotch=1 --download-sprng=1
                    --download-superlu=1 --download-superlu_dist=1
                    --download-triangle=1 --with-avx512-kernels=1
--with-blaslapack-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
                    --with-debugging=0 --with-hdf5=1
--with-mkl_pardiso-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
                    --with-scalapack=1
--with-scalapack-lib="[/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
                    --with-x=0<br>
                    >>> <br>
                    >>> Cluster with PETSc 3.11.3 (looks
                    weired):
--prefix=/scinet/niagara/software/2019b/opt/intel-2019u4-intelmpi-2019u4/petsc/3.11.3
                    CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpifc
                    COPTFLAGS="-march=native -O2"
                    CXXOPTFLAGS="-march=native -O2"
                    FOPTFLAGS="-march=native -O2" --download-chaco=1
                    --download-hdf5=1 --download-hypre=1
                    --download-metis=1 --download-ml=1
                    --download-mumps=1 --download-parmetis=1
                    --download-plapack=1 --download-prometheus=1
                    --download-ptscotch=1 --download-scotch=1
                    --download-sprng=1 --download-superlu=1
                    --download-superlu_dist=1 --download-triangle=1
                    --with-avx512-kernels=1
--with-blaslapack-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
                    --with-cxx-dialect=C++11 --with-debugging=0
--with-mkl_pardiso-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
                    --with-scalapack=1
--with-scalapack-lib="[/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
                    --with-x=0<br>
                    >>> <br>
                    >>> And the partition is used by default
                    dmplex distribution.<br>
                    >>> <br>
                    >>>       !c distribute mesh over processes<br>
                    >>>       call
                    DMPlexDistribute(dmda_flow%da,stencil_width,       
                            &<br>
                    >>>                           
                     PETSC_NULL_SF,                             &<br>
                    >>>                           
                     PETSC_NULL_OBJECT,                         &<br>
                    >>>                           
                     distributedMesh,ierr)<br>
                    >>>       CHKERRQ(ierr)<br>
                    >>> <br>
                    >>> Any idea on this strange problem?<br>
                    >>> <br>
                    >>> <br>
                    >>> I just looked at the code. Your mesh
                    should be partitioned by k-way partitioning using
                    Metis since its on 1 proc for partitioning. This
                    code<br>
                    >>> is the same for 3.9 and 3.11, and you
                    get the same result on your machine. I cannot
                    understand what might be happening on your cluster<br>
                    >>> (MPI plays no role). Is it possible
                    that you changed the adjacency specification in that
                    version?<br>
                    >>> <br>
                    >>>   Thanks,<br>
                    >>> <br>
                    >>>      Matt<br>
                    >>> Thanks,<br>
                    >>> <br>
                    >>> Danyang<br>
                    >>> <br>
                    >>> <br>
                    >>> <br>
                    >>> -- <br>
                    >>> What most experimenters take for
                    granted before they begin their experiments is
                    infinitely more interesting than any results to
                    which their experiments lead.<br>
                    >>> -- Norbert Wiener<br>
                    >>> <br>
                    >>> <a href="https://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                    >>> <br>
                    >>> -- <br>
                    >>> Sent from my Android device with K-9
                    Mail. Please excuse my brevity.<br>
                    >>> <br>
                    >>> <br>
                    >>> -- <br>
                    >>> What most experimenters take for
                    granted before they begin their experiments is
                    infinitely more interesting than any results to
                    which their experiments lead.<br>
                    >>> -- Norbert Wiener<br>
                    >>> <br>
                    >>> <a href="https://www.cse.buffalo.edu/~knepley/" rel="noreferrer" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                    >
                    <basin-petsc-3.9.3.log><basin-petsc-3.11.3.log><br>
                    <br>
                  </blockquote>
                </div>
                <br clear="all">
                <div><br>
                </div>
                -- <br>
                <div dir="ltr" class="gmail-m_4932417258908498662gmail-m_-7400272017397527304gmail-m_7601587419187439590gmail-m_6978125811855528906gmail_signature">
                  <div dir="ltr">
                    <div>
                      <div dir="ltr">
                        <div>
                          <div dir="ltr">
                            <div>What most experimenters take for
                              granted before they begin their
                              experiments is infinitely more interesting
                              than any results to which their
                              experiments lead.<br>
                              -- Norbert Wiener</div>
                            <div><br>
                            </div>
                            <div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_4932417258908498662gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>