[petsc-users] Strange Partition in PETSc 3.11 version on some computers

Mark Adams mfadams at lbl.gov
Fri Oct 18 18:45:26 CDT 2019


My point is that if these partitions are in fact some simple chopping of
the source grid, then ParMetis might not be used somehow,and you are just
in effect using "-mat_partitioning_type current".
If these partitions could in fact be from a simple 1D,
lexicographical, partitioning of the input vertices then this would
indicate that ParMetis is not active for some reason.

On Fri, Oct 18, 2019 at 7:25 PM Danyang Su <danyang.su at gmail.com> wrote:

> I use the default partition from PETSc. Is there any partition option
> available from PETSc side for METIS?
>
> Thanks,
>
> Danyang
> On 2019-10-18 3:32 p.m., Mark Adams wrote:
>
> The 3.11 and 3.12 partitions look like a default, lexicographical,
> partitioning of a certain mesh that I can not see.  Could this be the
> original partitioning (ie, "current" partitioning type)?
>
> On Fri, Oct 18, 2019 at 5:54 PM Danyang Su via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi All,
>>
>> I am now able to reproduce the partition problem using a relatively small
>> mesh (attached). The mesh consists of 9087 nodes, 15656 prism cells. There
>> are 39 layers with 233 nodes for each layer. I have tested the partition
>> using PETSc as well as Gmsh 3.0.1.
>>
>> Taking 4 partitions as an example, the partition from PETSc 3.9 and 3.10
>> are reasonable though not perfect, with total number of ghost nodes / total
>> number of nodes ratio 2754 / 9087.
>>
>> The partition from PETSc 3.11, PETSc 3.12 and PETSc-dev look weird, with
>> total number of ghost nodes / total number of nodes: 12413 / 9087. The
>> nodes are not well connected for the same processor.
>>
>> Note: the z axis is scaled by 25 for better visualization in paraview.
>>
>>
>> The partition from Gmsh-Metis is a bit different but still quite similar
>> to PETSc 3.9 and 3.10.
>>
>> Finally, the partition using Gmsh-Chaco Multilevel-KL algorithm is the
>> best one, with total number of ghost nodes / total number of nodes: 741 /
>> 9087 . For most of my simulation cases with much larger meshes, PETSc 3.9
>> and 3.10 generate partition similar to the one below, which work pretty
>> well and the code can get very good speedup.
>>
>> Thanks,
>>
>> Danyang
>> On 2019-09-18 11:44 a.m., Danyang Su wrote:
>>
>>
>> On 2019-09-18 10:56 a.m., Smith, Barry F. via petsc-users wrote:
>>
>>
>> On Sep 18, 2019, at 12:25 PM, Mark Lohry via petsc-users
>> <petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov> wrote:
>>
>> Mark,
>>
>>
>>      Mark,
>>
>>        Good point. This has been a big headache forever
>>
>>        Note that this has been "fixed" in the master version of PETSc and
>> will be in its next release. If you use --download-parmetis in the future
>> it will use the same random numbers on all machines and thus should produce
>> the same partitions on all machines.
>>
>>         I think that metis has aways used the same random numbers and all
>> machines and thus always produced the same results.
>>
>>      Barry
>>
>> Good to know this. I will the same configuration that causes strange
>> partition problem to test the next version.
>>
>> Thanks,
>>
>> Danyang
>>
>>
>>
>> The machine, compiler and MPI version should not matter.
>>
>> I might have missed something earlier in the thread, but parmetis has a
>> dependency on the machine's glibc srand, and it can (and does) create
>> different partitions with different srand versions. The same mesh on the
>> same code on the same process count can and will give different partitions
>> (possibly bad ones) on different machines.
>>
>> On Tue, Sep 17, 2019 at 1:05 PM Mark Adams via petsc-users
>> <petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov> wrote:
>>
>>
>> On Tue, Sep 17, 2019 at 12:53 PM Danyang Su <danyang.su at gmail.com>
>> <danyang.su at gmail.com> wrote:
>> Hi Mark,
>>
>> Thanks for your follow-up.
>>
>> The unstructured grid code has been verified and there is no problem in
>> the results. The convergence rate is also good. The 3D mesh is not good, it
>> is based on the original stratum which I haven't refined, but good for
>> initial test as it is relative small and the results obtained from this
>> mesh still makes sense.
>>
>> The 2D meshes are just for testing purpose as I want to reproduce the
>> partition problem on a cluster using PETSc3.11.3 and Intel2019.
>> Unfortunately, I didn't find problem using this example.
>>
>> The code has no problem in using different PETSc versions (PETSc V3.4 to
>> V3.11)
>>
>> OK, it is the same code. I thought I saw something about your code
>> changing.
>>
>> Just to be clear, v3.11 never gives you good partitions. It is not just a
>> problem on this Intel cluster.
>>
>> The machine, compiler and MPI version should not matter.
>>   and MPI distribution (MPICH, OpenMPI, IntelMPI), except for one
>> simulation case (the mesh I attached) on a cluster with PETSc3.11.3 and
>> Intel2019u4 due to the very different partition compared to PETSc3.9.3. Yet
>> the simulation results are the same except for the efficiency problem
>> because the strange partition results into much more communication (ghost
>> nodes).
>>
>> I am still trying different compiler and mpi with PETSc3.11.3 on that
>> cluster to trace the problem. Will get back to you guys when there is
>> update.
>>
>>
>> This is very strange. You might want to use 'git bisect'. You set a good
>> and a bad SHA1 (we can give you this for 3.9 and 3.11 and the exact
>> commands). The git will go to a version in the middle. You then
>> reconfigure, remake, rebuild your code, run your test. Git will ask you, as
>> I recall, if the version is good or bad. Once you get this workflow going
>> it is not too bad, depending on how hard this loop is of course.
>>   Thanks,
>>
>> danyang
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/e01604a5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basin-3d-dgr20000.png
Type: image/png
Size: 85113 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/e01604a5/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmsh-partition-metis.png
Type: image/png
Size: 61754 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/e01604a5/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmsh-partition-Chaco.png
Type: image/png
Size: 66392 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/e01604a5/attachment-0005.png>


More information about the petsc-users mailing list