<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 2019-09-17 10:07 a.m., Mark Adams
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CADOhEh42W+uZpbNSvG9cpsY5hEKf+N2Vp2XgfNMA47wc5OFvMQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Matt that sound like it.
<div><br>
</div>
<div>danyang, just in case its not clear, you need to delete
your architecture directory and reconfigure from scratch. You
should be able to just delete the
arch-dir/externalpackages/git.parmetis[metis] directories but
I'd simply delete the whole arch-dir.<br>
</div>
</div>
</blockquote>
<p>Many thanks to you all for the suggestions. I will try this first
and keep you updated.</p>
<p>Danyang<br>
</p>
<blockquote type="cite"
cite="mid:CADOhEh42W+uZpbNSvG9cpsY5hEKf+N2Vp2XgfNMA47wc5OFvMQ@mail.gmail.com"><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Sep 17, 2019 at 1:03
PM Matthew Knepley <<a href="mailto:knepley@gmail.com"
moz-do-not-send="true">knepley@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">On Tue, Sep 17, 2019 at 12:53 PM Danyang Su
<<a href="mailto:danyang.su@gmail.com" target="_blank"
moz-do-not-send="true">danyang.su@gmail.com</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Hi Mark,</p>
<p>Thanks for your follow-up. <br>
</p>
<p>The unstructured grid code has been verified and
there is no problem in the results. The convergence
rate is also good. The 3D mesh is not good, it is
based on the original stratum which I haven't
refined, but good for initial test as it is relative
small and the results obtained from this mesh still
makes sense.</p>
<p>The 2D meshes are just for testing purpose as I
want to reproduce the partition problem on a cluster
using PETSc3.11.3 and Intel2019. Unfortunately, I
didn't find problem using this example. </p>
<p>The code has no problem in using different PETSc
versions (PETSc V3.4 to V3.11) and MPI distribution
(MPICH, OpenMPI, IntelMPI), except for one
simulation case (the mesh I attached) on a cluster
with PETSc3.11.3 and Intel2019u4 due to the very
different partition compared to PETSc3.9.3. Yet the
simulation results are the same except for the
efficiency problem because the strange partition
results into much more communication (ghost nodes).</p>
<p>I am still trying different compiler and mpi with
PETSc3.11.3 on that cluster to trace the problem.
Will get back to you guys when there is update.</p>
</div>
</blockquote>
<div>You had --download-parmetis in your configure
command, but I wonder if it is possible that it actually
was not downloaded and</div>
<div>already present. The type of the ParMetis weights can
be changed, and if the type that PETSc thinks it is does
not match the</div>
<div>actual library type, then the weights could all be
crazy numbers. I seem to recall someone changing the
weight type in a release,</div>
<div>which might mean that the built ParMetis was fine
with one version and not the other.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Thanks,</p>
<p>danyang<br>
</p>
<div
class="gmail-m_4932417258908498662gmail-m_-7400272017397527304moz-cite-prefix">On
2019-09-17 9:02 a.m., Mark Adams wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Danyang, <br>
<div><br>
</div>
<div>Excuse me if I missed something in this
thread but just a few ideas.</div>
<div><br>
</div>
<div>First, I trust that you have verified that
you are getting a good solution with these bad
meshes. Ideally you would check that the solver
convergence rates are similar.</div>
<div><br>
</div>
<div>You might verify that your mesh is inside of
DMPLex correctly. You can visualize a Plex mesh
very easily. (let us know if you need
instructions).</div>
<div><br>
</div>
<div>This striping on the 2D meshes look something
like what you are getting with your 3D PRISM
mesh. DMPLex just calls Parmetis with a flat
graph. It is odd to me that your rectangular
grids have so much structure and are
non-isotropic. I assume that these
rectangular meshes are isotropic (eg, squares).</div>
<div><br>
</div>
<div>Anyway, just some thoughts,</div>
<div>Mark</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Sep 17,
2019 at 12:43 AM Danyang Su via petsc-users <<a
href="mailto:petsc-users@mcs.anl.gov"
target="_blank" moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p><br>
</p>
<div
class="gmail-m_4932417258908498662gmail-m_-7400272017397527304gmail-m_7601587419187439590gmail-m_6978125811855528906moz-cite-prefix">On
2019-09-16 12:02 p.m., Matthew Knepley
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">On Mon, Sep 16, 2019 at
1:46 PM Smith, Barry F. <<a
href="mailto:bsmith@mcs.anl.gov"
target="_blank" moz-do-not-send="true">bsmith@mcs.anl.gov</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><br>
Very different stuff going on in the
two cases, different objects being
created, different number of different
types of operations. Clearly a major
refactorization of the code was done.
Presumably a regression was introduced
that changed the behavior
dramatically, possible by mistake. <br>
<br>
You can attempt to use git bisect
to determine what changed caused the
dramatic change in behavior. Then it
can be decided if the changed that
triggered the change in the results
was a bug or a planned feature.<br>
</blockquote>
<div><br>
</div>
<div>Danyang,</div>
<div><br>
</div>
<div>Can you send me the smallest mesh
you care about, and I will look at the
partitioning? We can at least get
quality metrics</div>
<div>between these two releases.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
</div>
</div>
</blockquote>
<p>Hi Matt, <br>
</p>
<p>This is the smallest mesh for the regional
scale simulation that has strange partition
problem. It can be download via the link
below.<br>
</p>
<p><a
class="gmail-m_4932417258908498662gmail-m_-7400272017397527304gmail-m_7601587419187439590gmail-m_6978125811855528906moz-txt-link-freetext"
href="https://www.dropbox.com/s/tu34jgqqhkz8pwj/basin-3d.vtk?dl=0"
target="_blank" moz-do-not-send="true">https://www.dropbox.com/s/tu34jgqqhkz8pwj/basin-3d.vtk?dl=0</a></p>
<p>I am trying to reproduce the similar
problem using smaller 2D mesh, however,
there is no such problem in 2D, even though
the partitions using PETSc 3.9.3 and 3.11.3
are a bit different, they both look
reasonable. As shown below, both rectangular
mesh and triangular mesh use DMPlex.<br>
</p>
<p><img
src="cid:part6.C75A7905.6F06E545@gmail.com"
alt="2D rectangular and triangle mesh"
class="" width="1134" height="780"></p>
<p>I will keep on testing using PETSc3.11.3
but with different compiler and MPI to check
if I can reproduce the problem.</p>
<p>Thanks,</p>
<p>Danyang<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Barry<br>
<br>
<br>
> On Sep 16, 2019, at 11:50 AM,
Danyang Su <<a
href="mailto:danyang.su@gmail.com"
target="_blank"
moz-do-not-send="true">danyang.su@gmail.com</a>>
wrote:<br>
> <br>
> Hi Barry and Matt,<br>
> <br>
> Attached is the output of both
runs with -dm_view -log_view included.<br>
> <br>
> I am now coordinating with staff
to install PETSc 3.9.3 version using
intel2019u4 to narrow down the
problem. Will get back to you later
after the test.<br>
> <br>
> Thanks,<br>
> <br>
> Danyang<br>
> <br>
> On 2019-09-15 4:43 p.m., Smith,
Barry F. wrote:<br>
>> Send the configure.log and
make.log for the two system
configurations that produce very
different results as well as the
output running with -dm_view -info for
both runs. The cause is likely not
subtle, one is likely using metis and
the other is likely just not using any
partitioner.<br>
>> <br>
>> <br>
>> <br>
>>> On Sep 15, 2019, at 6:07
PM, Matthew Knepley via petsc-users
<<a
href="mailto:petsc-users@mcs.anl.gov"
target="_blank"
moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
wrote:<br>
>>> <br>
>>> On Sun, Sep 15, 2019 at
6:59 PM Danyang Su <<a
href="mailto:danyang.su@gmail.com"
target="_blank"
moz-do-not-send="true">danyang.su@gmail.com</a>>
wrote:<br>
>>> Hi Matt,<br>
>>> <br>
>>> Thanks for the quick
reply. I have no change in the
adjacency. The source code and the
simulation input files are all the
same. I also tried to use GNU compiler
and mpich with petsc 3.11.3 and it
works fine.<br>
>>> <br>
>>> It looks like the problem
is caused by the difference in
configuration. However, the
configuration is pretty the same as
petsc 3.9.3 except the compiler and
mpi used. I will contact scinet staff
to check if they have any idea on
this.<br>
>>> <br>
>>> Very very strange since
the partition is handled completely by
Metis, and does not use MPI.<br>
>>> <br>
>>> Thanks,<br>
>>> <br>
>>> Matt<br>
>>> Thanks,<br>
>>> <br>
>>> Danyang<br>
>>> <br>
>>> On September 15, 2019
3:20:18 p.m. PDT, Matthew Knepley <<a
href="mailto:knepley@gmail.com"
target="_blank"
moz-do-not-send="true">knepley@gmail.com</a>>
wrote:<br>
>>> On Sun, Sep 15, 2019 at
5:19 PM Danyang Su via petsc-users
<<a
href="mailto:petsc-users@mcs.anl.gov"
target="_blank"
moz-do-not-send="true">petsc-users@mcs.anl.gov</a>>
wrote:<br>
>>> Dear All,<br>
>>> <br>
>>> I have a question
regarding strange partition problem in
PETSc 3.11 version. The problem does
not exist on my local workstation.
However, on a cluster with different
PETSc versions, the partition seems
quite different, as you can find in
the figure below, which is tested with
160 processors. The color means the
processor owns that subdomain. In this
layered prism mesh, there are 40
layers from bottom to top and each
layer has around 20k nodes. The
natural order of nodes is also layered
from bottom to top.<br>
>>> <br>
>>> The left partition (PETSc
3.10 and earlier) looks good with
minimum number of ghost nodes while
the right one (PETSc 3.11) looks
weired with huge number of ghost
nodes. Looks like the right one uses
partition layer by layer. This problem
exists on a a cluster but not on my
local workstation for the same PETSc
version (with different compiler and
MPI). Other than the difference in
partition and efficiency, the
simulation results are the same.<br>
>>> <br>
>>> <br>
>>> <br>
>>> <br>
>>> Below is PETSc
configuration on three machine:<br>
>>> <br>
>>> Local workstation (works
fine): ./configure --with-cc=gcc
--with-cxx=g++ --with-fc=gfortran
--download-mpich --download-scalapack
--download-parmetis --download-metis
--download-ptscotch
--download-fblaslapack
--download-hypre
--download-superlu_dist
--download-hdf5=yes --download-ctetgen
--with-debugging=0 COPTFLAGS=-O3
CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
--with-cxx-dialect=C++11<br>
>>> <br>
>>> Cluster with PETSc 3.9.3
(works fine):
--prefix=/scinet/niagara/software/2018a/opt/intel-2018.2-intelmpi-2018.2/petsc/3.9.3
CC=mpicc CXX=mpicxx F77=mpif77
F90=mpif90 FC=mpifc
COPTFLAGS="-march=native -O2"
CXXOPTFLAGS="-march=native -O2"
FOPTFLAGS="-march=native -O2"
--download-chaco=1 --download-hypre=1
--download-metis=1 --download-ml=1
--download-mumps=1
--download-parmetis=1
--download-plapack=1
--download-prometheus=1
--download-ptscotch=1
--download-scotch=1 --download-sprng=1
--download-superlu=1
--download-superlu_dist=1
--download-triangle=1
--with-avx512-kernels=1
--with-blaslapack-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
--with-debugging=0 --with-hdf5=1
--with-mkl_pardiso-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
--with-scalapack=1
--with-scalapack-lib="[/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
--with-x=0<br>
>>> <br>
>>> Cluster with PETSc 3.11.3
(looks weired):
--prefix=/scinet/niagara/software/2019b/opt/intel-2019u4-intelmpi-2019u4/petsc/3.11.3
CC=mpicc CXX=mpicxx F77=mpif77
F90=mpif90 FC=mpifc
COPTFLAGS="-march=native -O2"
CXXOPTFLAGS="-march=native -O2"
FOPTFLAGS="-march=native -O2"
--download-chaco=1 --download-hdf5=1
--download-hypre=1 --download-metis=1
--download-ml=1 --download-mumps=1
--download-parmetis=1
--download-plapack=1
--download-prometheus=1
--download-ptscotch=1
--download-scotch=1 --download-sprng=1
--download-superlu=1
--download-superlu_dist=1
--download-triangle=1
--with-avx512-kernels=1
--with-blaslapack-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
--with-cxx-dialect=C++11
--with-debugging=0
--with-mkl_pardiso-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
--with-scalapack=1
--with-scalapack-lib="[/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
--with-x=0<br>
>>> <br>
>>> And the partition is used
by default dmplex distribution.<br>
>>> <br>
>>> !c distribute mesh
over processes<br>
>>> call
DMPlexDistribute(dmda_flow%da,stencil_width,
&<br>
>>>
PETSC_NULL_SF,
&<br>
>>>
PETSC_NULL_OBJECT,
&<br>
>>>
distributedMesh,ierr)<br>
>>> CHKERRQ(ierr)<br>
>>> <br>
>>> Any idea on this strange
problem?<br>
>>> <br>
>>> <br>
>>> I just looked at the
code. Your mesh should be partitioned
by k-way partitioning using Metis
since its on 1 proc for partitioning.
This code<br>
>>> is the same for 3.9 and
3.11, and you get the same result on
your machine. I cannot understand what
might be happening on your cluster<br>
>>> (MPI plays no role). Is
it possible that you changed the
adjacency specification in that
version?<br>
>>> <br>
>>> Thanks,<br>
>>> <br>
>>> Matt<br>
>>> Thanks,<br>
>>> <br>
>>> Danyang<br>
>>> <br>
>>> <br>
>>> <br>
>>> -- <br>
>>> What most experimenters
take for granted before they begin
their experiments is infinitely more
interesting than any results to which
their experiments lead.<br>
>>> -- Norbert Wiener<br>
>>> <br>
>>> <a
href="https://www.cse.buffalo.edu/~knepley/"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
>>> <br>
>>> -- <br>
>>> Sent from my Android
device with K-9 Mail. Please excuse my
brevity.<br>
>>> <br>
>>> <br>
>>> -- <br>
>>> What most experimenters
take for granted before they begin
their experiments is infinitely more
interesting than any results to which
their experiments lead.<br>
>>> -- Norbert Wiener<br>
>>> <br>
>>> <a
href="https://www.cse.buffalo.edu/~knepley/"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
>
<basin-petsc-3.9.3.log><basin-petsc-3.11.3.log><br>
<br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr"
class="gmail-m_4932417258908498662gmail-m_-7400272017397527304gmail-m_7601587419187439590gmail-m_6978125811855528906gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters
take for granted before they
begin their experiments is
infinitely more interesting
than any results to which
their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr"
class="gmail-m_4932417258908498662gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted
before they begin their experiments is
infinitely more interesting than any results
to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>