[petsc-users] Strange Partition in PETSc 3.11 version on some computers

Danyang Su danyang.su at gmail.com
Mon Sep 16 23:41:45 CDT 2019


On 2019-09-16 12:02 p.m., Matthew Knepley wrote:
> On Mon, Sep 16, 2019 at 1:46 PM Smith, Barry F. <bsmith at mcs.anl.gov 
> <mailto:bsmith at mcs.anl.gov>> wrote:
>
>
>       Very different stuff going on in the two cases, different
>     objects being created, different number of different types of
>     operations. Clearly a major refactorization of the code was done.
>     Presumably a regression was introduced that changed the behavior
>     dramatically, possible by mistake.
>
>        You can attempt to use git bisect to determine what changed
>     caused the dramatic change in behavior. Then it can be decided if
>     the changed that triggered the change in the results was a bug or
>     a planned feature.
>
>
> Danyang,
>
> Can you send me the smallest mesh you care about, and I will look at 
> the partitioning? We can at least get quality metrics
> between these two releases.
>
>   Thanks,
>
>      Matt

Hi Matt,

This is the smallest mesh for the regional scale simulation that has 
strange partition problem. It can be download via the link below.

https://www.dropbox.com/s/tu34jgqqhkz8pwj/basin-3d.vtk?dl=0

I am trying to reproduce the similar problem using smaller 2D mesh, 
however, there is no such problem in 2D, even though the partitions 
using PETSc 3.9.3 and 3.11.3 are a bit different, they both look 
reasonable. As shown below, both rectangular mesh and triangular mesh 
use DMPlex.

2D rectangular and triangle mesh

I will keep on testing using PETSc3.11.3 but with different compiler and 
MPI to check if I can reproduce the problem.

Thanks,

Danyang

>
>        Barry
>
>
>     > On Sep 16, 2019, at 11:50 AM, Danyang Su <danyang.su at gmail.com
>     <mailto:danyang.su at gmail.com>> wrote:
>     >
>     > Hi Barry and Matt,
>     >
>     > Attached is the output of both runs with -dm_view -log_view
>     included.
>     >
>     > I am now coordinating with staff to install PETSc 3.9.3 version
>     using intel2019u4 to narrow down the problem. Will get back to you
>     later after the test.
>     >
>     > Thanks,
>     >
>     > Danyang
>     >
>     > On 2019-09-15 4:43 p.m., Smith, Barry F. wrote:
>     >>   Send the configure.log and make.log for the two system
>     configurations that produce very different results as well as the
>     output running with -dm_view -info for both runs. The cause is
>     likely not subtle, one is likely using metis and the other is
>     likely just not using any partitioner.
>     >>
>     >>
>     >>
>     >>> On Sep 15, 2019, at 6:07 PM, Matthew Knepley via petsc-users
>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>     >>>
>     >>> On Sun, Sep 15, 2019 at 6:59 PM Danyang Su
>     <danyang.su at gmail.com <mailto:danyang.su at gmail.com>> wrote:
>     >>> Hi Matt,
>     >>>
>     >>> Thanks for the quick reply. I have no change in the adjacency.
>     The source code and the simulation input files are all the same. I
>     also tried to use GNU compiler and mpich with petsc 3.11.3 and it
>     works fine.
>     >>>
>     >>> It looks like the problem is caused by the difference in
>     configuration. However, the configuration is pretty the same as
>     petsc 3.9.3 except the compiler and mpi used. I will contact
>     scinet staff to check if they have any idea on this.
>     >>>
>     >>> Very very strange since the partition is handled completely by
>     Metis, and does not use MPI.
>     >>>
>     >>>   Thanks,
>     >>>
>     >>>     Matt
>     >>>  Thanks,
>     >>>
>     >>> Danyang
>     >>>
>     >>> On September 15, 2019 3:20:18 p.m. PDT, Matthew Knepley
>     <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>     >>> On Sun, Sep 15, 2019 at 5:19 PM Danyang Su via petsc-users
>     <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>     >>> Dear All,
>     >>>
>     >>> I have a question regarding strange partition problem in PETSc
>     3.11 version. The problem does not exist on my local workstation.
>     However, on a cluster with different PETSc versions, the partition
>     seems quite different, as you can find in the figure below, which
>     is tested with 160 processors. The color means the processor owns
>     that subdomain. In this layered prism mesh, there are 40 layers
>     from bottom to top and each layer has around 20k nodes. The
>     natural order of nodes is also layered from bottom to top.
>     >>>
>     >>> The left partition (PETSc 3.10 and earlier) looks good with
>     minimum number of ghost nodes while the right one (PETSc 3.11)
>     looks weired with huge number of ghost nodes. Looks like the right
>     one uses partition layer by layer. This problem exists on a a
>     cluster but not on my local workstation for the same PETSc version
>     (with different compiler and MPI). Other than the difference in
>     partition and efficiency, the simulation results are the same.
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> Below is PETSc configuration on three machine:
>     >>>
>     >>> Local workstation (works fine):  ./configure --with-cc=gcc
>     --with-cxx=g++ --with-fc=gfortran --download-mpich
>     --download-scalapack --download-parmetis --download-metis
>     --download-ptscotch --download-fblaslapack --download-hypre
>     --download-superlu_dist --download-hdf5=yes --download-ctetgen
>     --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
>     --with-cxx-dialect=C++11
>     >>>
>     >>> Cluster with PETSc 3.9.3 (works fine):
>     --prefix=/scinet/niagara/software/2018a/opt/intel-2018.2-intelmpi-2018.2/petsc/3.9.3
>     CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpifc
>     COPTFLAGS="-march=native -O2" CXXOPTFLAGS="-march=native -O2"
>     FOPTFLAGS="-march=native -O2" --download-chaco=1
>     --download-hypre=1 --download-metis=1 --download-ml=1
>     --download-mumps=1 --download-parmetis=1 --download-plapack=1
>     --download-prometheus=1 --download-ptscotch=1 --download-scotch=1
>     --download-sprng=1 --download-superlu=1 --download-superlu_dist=1
>     --download-triangle=1 --with-avx512-kernels=1
>     --with-blaslapack-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
>     --with-debugging=0 --with-hdf5=1
>     --with-mkl_pardiso-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
>     --with-scalapack=1
>     --with-scalapack-lib="[/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
>     --with-x=0
>     >>>
>     >>> Cluster with PETSc 3.11.3 (looks weired):
>     --prefix=/scinet/niagara/software/2019b/opt/intel-2019u4-intelmpi-2019u4/petsc/3.11.3
>     CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpifc
>     COPTFLAGS="-march=native -O2" CXXOPTFLAGS="-march=native -O2"
>     FOPTFLAGS="-march=native -O2" --download-chaco=1 --download-hdf5=1
>     --download-hypre=1 --download-metis=1 --download-ml=1
>     --download-mumps=1 --download-parmetis=1 --download-plapack=1
>     --download-prometheus=1 --download-ptscotch=1 --download-scotch=1
>     --download-sprng=1 --download-superlu=1 --download-superlu_dist=1
>     --download-triangle=1 --with-avx512-kernels=1
>     --with-blaslapack-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
>     --with-cxx-dialect=C++11 --with-debugging=0
>     --with-mkl_pardiso-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
>     --with-scalapack=1
>     --with-scalapack-lib="[/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
>     --with-x=0
>     >>>
>     >>> And the partition is used by default dmplex distribution.
>     >>>
>     >>>       !c distribute mesh over processes
>     >>>       call DMPlexDistribute(dmda_flow%da,stencil_width, &
>     >>>                             PETSC_NULL_SF,                    
>        &
>     >>>                             PETSC_NULL_OBJECT,                
>            &
>     >>>  distributedMesh,ierr)
>     >>>       CHKERRQ(ierr)
>     >>>
>     >>> Any idea on this strange problem?
>     >>>
>     >>>
>     >>> I just looked at the code. Your mesh should be partitioned by
>     k-way partitioning using Metis since its on 1 proc for
>     partitioning. This code
>     >>> is the same for 3.9 and 3.11, and you get the same result on
>     your machine. I cannot understand what might be happening on your
>     cluster
>     >>> (MPI plays no role). Is it possible that you changed the
>     adjacency specification in that version?
>     >>>
>     >>>   Thanks,
>     >>>
>     >>>      Matt
>     >>> Thanks,
>     >>>
>     >>> Danyang
>     >>>
>     >>>
>     >>>
>     >>> --
>     >>> What most experimenters take for granted before they begin
>     their experiments is infinitely more interesting than any results
>     to which their experiments lead.
>     >>> -- Norbert Wiener
>     >>>
>     >>> https://www.cse.buffalo.edu/~knepley/
>     >>>
>     >>> --
>     >>> Sent from my Android device with K-9 Mail. Please excuse my
>     brevity.
>     >>>
>     >>>
>     >>> --
>     >>> What most experimenters take for granted before they begin
>     their experiments is infinitely more interesting than any results
>     to which their experiments lead.
>     >>> -- Norbert Wiener
>     >>>
>     >>> https://www.cse.buffalo.edu/~knepley/
>     > <basin-petsc-3.9.3.log><basin-petsc-3.11.3.log>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190916/e24ba29c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc-partition-compare.png
Type: image/png
Size: 69346 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190916/e24ba29c/attachment-0001.png>


More information about the petsc-users mailing list