[petsc-users] Strange Partition in PETSc 3.11 version on some computers
Mark Adams
mfadams at lbl.gov
Fri Oct 18 17:32:56 CDT 2019
The 3.11 and 3.12 partitions look like a default, lexicographical,
partitioning of a certain mesh that I can not see. Could this be the
original partitioning (ie, "current" partitioning type)?
On Fri, Oct 18, 2019 at 5:54 PM Danyang Su via petsc-users <
petsc-users at mcs.anl.gov> wrote:
> Hi All,
>
> I am now able to reproduce the partition problem using a relatively small
> mesh (attached). The mesh consists of 9087 nodes, 15656 prism cells. There
> are 39 layers with 233 nodes for each layer. I have tested the partition
> using PETSc as well as Gmsh 3.0.1.
>
> Taking 4 partitions as an example, the partition from PETSc 3.9 and 3.10
> are reasonable though not perfect, with total number of ghost nodes / total
> number of nodes ratio 2754 / 9087.
>
> The partition from PETSc 3.11, PETSc 3.12 and PETSc-dev look weird, with
> total number of ghost nodes / total number of nodes: 12413 / 9087. The
> nodes are not well connected for the same processor.
>
> Note: the z axis is scaled by 25 for better visualization in paraview.
>
>
> The partition from Gmsh-Metis is a bit different but still quite similar
> to PETSc 3.9 and 3.10.
>
> Finally, the partition using Gmsh-Chaco Multilevel-KL algorithm is the
> best one, with total number of ghost nodes / total number of nodes: 741 /
> 9087 . For most of my simulation cases with much larger meshes, PETSc 3.9
> and 3.10 generate partition similar to the one below, which work pretty
> well and the code can get very good speedup.
>
> Thanks,
>
> Danyang
> On 2019-09-18 11:44 a.m., Danyang Su wrote:
>
>
> On 2019-09-18 10:56 a.m., Smith, Barry F. via petsc-users wrote:
>
>
> On Sep 18, 2019, at 12:25 PM, Mark Lohry via petsc-users
> <petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov> wrote:
>
> Mark,
>
>
> Mark,
>
> Good point. This has been a big headache forever
>
> Note that this has been "fixed" in the master version of PETSc and
> will be in its next release. If you use --download-parmetis in the future
> it will use the same random numbers on all machines and thus should produce
> the same partitions on all machines.
>
> I think that metis has aways used the same random numbers and all
> machines and thus always produced the same results.
>
> Barry
>
> Good to know this. I will the same configuration that causes strange
> partition problem to test the next version.
>
> Thanks,
>
> Danyang
>
>
>
> The machine, compiler and MPI version should not matter.
>
> I might have missed something earlier in the thread, but parmetis has a
> dependency on the machine's glibc srand, and it can (and does) create
> different partitions with different srand versions. The same mesh on the
> same code on the same process count can and will give different partitions
> (possibly bad ones) on different machines.
>
> On Tue, Sep 17, 2019 at 1:05 PM Mark Adams via petsc-users
> <petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov> wrote:
>
>
> On Tue, Sep 17, 2019 at 12:53 PM Danyang Su <danyang.su at gmail.com>
> <danyang.su at gmail.com> wrote:
> Hi Mark,
>
> Thanks for your follow-up.
>
> The unstructured grid code has been verified and there is no problem in
> the results. The convergence rate is also good. The 3D mesh is not good, it
> is based on the original stratum which I haven't refined, but good for
> initial test as it is relative small and the results obtained from this
> mesh still makes sense.
>
> The 2D meshes are just for testing purpose as I want to reproduce the
> partition problem on a cluster using PETSc3.11.3 and Intel2019.
> Unfortunately, I didn't find problem using this example.
>
> The code has no problem in using different PETSc versions (PETSc V3.4 to
> V3.11)
>
> OK, it is the same code. I thought I saw something about your code
> changing.
>
> Just to be clear, v3.11 never gives you good partitions. It is not just a
> problem on this Intel cluster.
>
> The machine, compiler and MPI version should not matter.
> and MPI distribution (MPICH, OpenMPI, IntelMPI), except for one
> simulation case (the mesh I attached) on a cluster with PETSc3.11.3 and
> Intel2019u4 due to the very different partition compared to PETSc3.9.3. Yet
> the simulation results are the same except for the efficiency problem
> because the strange partition results into much more communication (ghost
> nodes).
>
> I am still trying different compiler and mpi with PETSc3.11.3 on that
> cluster to trace the problem. Will get back to you guys when there is
> update.
>
>
> This is very strange. You might want to use 'git bisect'. You set a good
> and a bad SHA1 (we can give you this for 3.9 and 3.11 and the exact
> commands). The git will go to a version in the middle. You then
> reconfigure, remake, rebuild your code, run your test. Git will ask you, as
> I recall, if the version is good or bad. Once you get this workflow going
> it is not too bad, depending on how hard this loop is of course.
> Thanks,
>
> danyang
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/f621b7dc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: basin-3d-dgr20000.png
Type: image/png
Size: 85113 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/f621b7dc/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmsh-partition-metis.png
Type: image/png
Size: 61754 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/f621b7dc/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmsh-partition-Chaco.png
Type: image/png
Size: 66392 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191018/f621b7dc/attachment-0005.png>
More information about the petsc-users
mailing list