[petsc-users] Strange Partition in PETSc 3.11 version on some computers

Sun Sep 15 17:58:59 CDT 2019

Hi Matt,

Thanks for the quick reply. I have no change in the adjacency. The source code and the simulation input files are all the same. I also tried to use GNU compiler and mpich with petsc 3.11.3 and it works fine.

It looks like the problem is caused by the difference in configuration. However, the configuration is pretty the same as petsc 3.9.3 except the compiler and mpi used. I will contact scinet staff to check if they have any idea on this. 

Thanks,

Danyang

On September 15, 2019 3:20:18 p.m. PDT, Matthew Knepley <knepley at gmail.com> wrote:
>On Sun, Sep 15, 2019 at 5:19 PM Danyang Su via petsc-users <
>petsc-users at mcs.anl.gov> wrote:
>
>> Dear All,
>>
>> I have a question regarding strange partition problem in PETSc 3.11
>> version. The problem does not exist on my local workstation. However,
>on a
>> cluster with different PETSc versions, the partition seems quite
>different,
>> as you can find in the figure below, which is tested with 160
>processors.
>> The color means the processor owns that subdomain. In this layered
>prism
>> mesh, there are 40 layers from bottom to top and each layer has
>around 20k
>> nodes. The natural order of nodes is also layered from bottom to top.
>>
>> The left partition (PETSc 3.10 and earlier) looks good with minimum
>number
>> of ghost nodes while the right one (PETSc 3.11) looks weired with
>huge
>> number of ghost nodes. Looks like the right one uses partition layer
>by
>> layer. This problem exists on a a cluster but not on my local
>workstation
>> for the same PETSc version (with different compiler and MPI). Other
>than
>> the difference in partition and efficiency, the simulation results
>are the
>> same.
>>
>> [image: partition difference]
>>
>> Below is PETSc configuration on three machine:
>>
>> Local workstation (works fine):  ./configure --with-cc=gcc
>--with-cxx=g++
>> --with-fc=gfortran --download-mpich --download-scalapack
>> --download-parmetis --download-metis --download-ptscotch
>> --download-fblaslapack --download-hypre --download-superlu_dist
>> --download-hdf5=yes --download-ctetgen --with-debugging=0
>COPTFLAGS=-O3
>> CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --with-cxx-dialect=C++11
>>
>> Cluster with PETSc 3.9.3 (works fine):
>>
>--prefix=/scinet/niagara/software/2018a/opt/intel-2018.2-intelmpi-2018.2/petsc/3.9.3
>> CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpifc
>COPTFLAGS="-march=native
>> -O2" CXXOPTFLAGS="-march=native -O2" FOPTFLAGS="-march=native -O2"
>> --download-chaco=1 --download-hypre=1 --download-metis=1
>--download-ml=1
>> --download-mumps=1 --download-parmetis=1 --download-plapack=1
>> --download-prometheus=1 --download-ptscotch=1 --download-scotch=1
>> --download-sprng=1 --download-superlu=1 --download-superlu_dist=1
>> --download-triangle=1 --with-avx512-kernels=1
>>
>--with-blaslapack-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
>> --with-debugging=0 --with-hdf5=1
>>
>--with-mkl_pardiso-dir=/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl
>> --with-scalapack=1
>>
>--with-scalapack-lib="[/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/niagara/intel/2018.2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
>> --with-x=0
>>
>> Cluster with PETSc 3.11.3 (looks weired):
>>
>--prefix=/scinet/niagara/software/2019b/opt/intel-2019u4-intelmpi-2019u4/petsc/3.11.3
>> CC=mpicc CXX=mpicxx F77=mpif77 F90=mpif90 FC=mpifc
>COPTFLAGS="-march=native
>> -O2" CXXOPTFLAGS="-march=native -O2" FOPTFLAGS="-march=native -O2"
>> --download-chaco=1 --download-hdf5=1 --download-hypre=1
>--download-metis=1
>> --download-ml=1 --download-mumps=1 --download-parmetis=1
>> --download-plapack=1 --download-prometheus=1 --download-ptscotch=1
>> --download-scotch=1 --download-sprng=1 --download-superlu=1
>> --download-superlu_dist=1 --download-triangle=1
>--with-avx512-kernels=1
>>
>--with-blaslapack-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
>> --with-cxx-dialect=C++11 --with-debugging=0
>>
>--with-mkl_pardiso-dir=/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl
>> --with-scalapack=1
>>
>--with-scalapack-lib="[/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_scalapack_lp64.so,/scinet/intel/2019u4/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.so]"
>> --with-x=0
>>
>> And the partition is used by default dmplex distribution.
>>
>>       !c distribute mesh over processes
>>       call DMPlexDistribute(dmda_flow%da,stencil_width,              
> &
>>                             PETSC_NULL_SF,                           
> &
>>                             PETSC_NULL_OBJECT,                       
> &
>>                             distributedMesh,ierr)
>>       CHKERRQ(ierr)
>>
>> Any idea on this strange problem?
>>
>> I just looked at the code. Your mesh should be partitioned by k-way
>partitioning using Metis since its on 1 proc for partitioning. This
>code
>is the same for 3.9 and 3.11, and you get the same result on your
>machine.
>I cannot understand what might be happening on your cluster
>(MPI plays no role). Is it possible that you changed the adjacency
>specification in that version?
>
>  Thanks,
>
>     Matt
>
>> Thanks,
>>
>> Danyang
>>
>
>
>-- 
>What most experimenters take for granted before they begin their
>experiments is infinitely more interesting than any results to which
>their
>experiments lead.
>-- Norbert Wiener
>
>https://www.cse.buffalo.edu/~knepley/
><http://www.cse.buffalo.edu/~knepley/>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190915/329ef597/attachment.html>