[petsc-users] Strange Partition in PETSc 3.11 version on some computers

Danyang Su danyang.su at gmail.com
Fri Oct 18 18:24:55 CDT 2019


I use the default partition from PETSc. Is there any partition option 
available from PETSc side for METIS?

Thanks,

Danyang

On 2019-10-18 3:32 p.m., Mark Adams wrote:
> The 3.11 and 3.12 partitions look like a default, lexicographical, 
> partitioning of a certain mesh that I can not see. Could this be the 
> original partitioning (ie, "current" partitioning type)?
>
> On Fri, Oct 18, 2019 at 5:54 PM Danyang Su via petsc-users 
> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>
>     Hi All,
>
>     I am now able to reproduce the partition problem using a
>     relatively small mesh (attached). The mesh consists of 9087 nodes,
>     15656 prism cells. There are 39 layers with 233 nodes for each
>     layer. I have tested the partition using PETSc as well as Gmsh 3.0.1.
>
>     Taking 4 partitions as an example, the partition from PETSc 3.9
>     and 3.10 are reasonable though not perfect, with total number of
>     ghost nodes / total number of nodes ratio 2754 / 9087.
>
>     The partition from PETSc 3.11, PETSc 3.12 and PETSc-dev look
>     weird, with total number of ghost nodes / total number of nodes:
>     12413 / 9087. The nodes are not well connected for the same processor.
>
>     Note: the z axis is scaled by 25 for better visualization in paraview.
>
>
>     The partition from Gmsh-Metis is a bit different but still quite
>     similar to PETSc 3.9 and 3.10.
>
>
>     Finally, the partition using Gmsh-Chaco Multilevel-KL algorithm is
>     the best one, with total number of ghost nodes / total number of
>     nodes: 741 / 9087 . For most of my simulation cases with much
>     larger meshes, PETSc 3.9 and 3.10 generate partition similar to
>     the one below, which work pretty well and the code can get very
>     good speedup.
>
>     Thanks,
>
>     Danyang
>
>     On 2019-09-18 11:44 a.m., Danyang Su wrote:
>>
>>     On 2019-09-18 10:56 a.m., Smith, Barry F. via petsc-users wrote:
>>>
>>>>     On Sep 18, 2019, at 12:25 PM, Mark Lohry via petsc-users
>>>>     <petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>     Mark,
>>>          Mark,
>>>
>>>            Good point. This has been a big headache forever
>>>
>>>            Note that this has been "fixed" in the master version of
>>>     PETSc and will be in its next release. If you use
>>>     --download-parmetis in the future it will use the same random
>>>     numbers on all machines and thus should produce the same
>>>     partitions on all machines.
>>>
>>>             I think that metis has aways used the same random
>>>     numbers and all machines and thus always produced the same results.
>>>
>>>          Barry
>>     Good to know this. I will the same configuration that causes
>>     strange partition problem to test the next version.
>>
>>     Thanks,
>>
>>     Danyang
>>
>>>
>>>
>>>>     The machine, compiler and MPI version should not matter.
>>>>
>>>>     I might have missed something earlier in the thread, but
>>>>     parmetis has a dependency on the machine's glibc srand, and it
>>>>     can (and does) create different partitions with different srand
>>>>     versions. The same mesh on the same code on the same process
>>>>     count can and will give different partitions (possibly bad
>>>>     ones) on different machines.
>>>>
>>>>     On Tue, Sep 17, 2019 at 1:05 PM Mark Adams via petsc-users
>>>>     <petsc-users at mcs.anl.gov> <mailto:petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>
>>>>     On Tue, Sep 17, 2019 at 12:53 PM Danyang Su
>>>>     <danyang.su at gmail.com> <mailto:danyang.su at gmail.com> wrote:
>>>>     Hi Mark,
>>>>
>>>>     Thanks for your follow-up.
>>>>
>>>>     The unstructured grid code has been verified and there is no
>>>>     problem in the results. The convergence rate is also good. The
>>>>     3D mesh is not good, it is based on the original stratum which
>>>>     I haven't refined, but good for initial test as it is relative
>>>>     small and the results obtained from this mesh still makes sense.
>>>>
>>>>     The 2D meshes are just for testing purpose as I want to
>>>>     reproduce the partition problem on a cluster using PETSc3.11.3
>>>>     and Intel2019. Unfortunately, I didn't find problem using this
>>>>     example.
>>>>
>>>>     The code has no problem in using different PETSc versions
>>>>     (PETSc V3.4 to V3.11)
>>>>
>>>>     OK, it is the same code. I thought I saw something about your
>>>>     code changing.
>>>>
>>>>     Just to be clear, v3.11 never gives you good partitions. It is
>>>>     not just a problem on this Intel cluster.
>>>>
>>>>     The machine, compiler and MPI version should not matter.
>>>>       and MPI distribution (MPICH, OpenMPI, IntelMPI), except for
>>>>     one simulation case (the mesh I attached) on a cluster with
>>>>     PETSc3.11.3 and Intel2019u4 due to the very different partition
>>>>     compared to PETSc3.9.3. Yet the simulation results are the same
>>>>     except for the efficiency problem because the strange partition
>>>>     results into much more communication (ghost nodes).
>>>>
>>>>     I am still trying different compiler and mpi with PETSc3.11.3
>>>>     on that cluster to trace the problem. Will get back to you guys
>>>>     when there is update.
>>>>
>>>>
>>>>     This is very strange. You might want to use 'git bisect'. You
>>>>     set a good and a bad SHA1 (we can give you this for 3.9 and
>>>>     3.11 and the exact commands). The git will go to a version in
>>>>     the middle. You then reconfigure, remake, rebuild your code,
>>>>     run your test. Git will ask you, as I recall, if the version is
>>>>     good or bad. Once you get this workflow going it is not too
>>>>     bad, depending on how hard this loop is of course.
>>>>       Thanks,
>>>>
>>>>     danyang
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/0332abb8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basin-3d-dgr20000.png
Type: image/png
Size: 85113 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/0332abb8/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmsh-partition-metis.png
Type: image/png
Size: 61754 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/0332abb8/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmsh-partition-Chaco.png
Type: image/png
Size: 66392 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/0332abb8/attachment-0005.png>


More information about the petsc-users mailing list