[petsc-users] Strange Partition in PETSc 3.11 version on some computers

Smith, Barry F. bsmith at mcs.anl.gov
Wed Sep 18 12:56:50 CDT 2019



> On Sep 18, 2019, at 12:25 PM, Mark Lohry via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Mark,
>  

    Mark,

      Good point. This has been a big headache forever

      Note that this has been "fixed" in the master version of PETSc and will be in its next release. If you use --download-parmetis in the future it will use the same random numbers on all machines and thus should produce the same partitions on all machines. 

       I think that metis has aways used the same random numbers and all machines and thus always produced the same results.

    Barry


> The machine, compiler and MPI version should not matter.
> 
> I might have missed something earlier in the thread, but parmetis has a dependency on the machine's glibc srand, and it can (and does) create different partitions with different srand versions. The same mesh on the same code on the same process count can and will give different partitions (possibly bad ones) on different machines.
> 
> On Tue, Sep 17, 2019 at 1:05 PM Mark Adams via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> 
> On Tue, Sep 17, 2019 at 12:53 PM Danyang Su <danyang.su at gmail.com> wrote:
> Hi Mark,
> 
> Thanks for your follow-up. 
> 
> The unstructured grid code has been verified and there is no problem in the results. The convergence rate is also good. The 3D mesh is not good, it is based on the original stratum which I haven't refined, but good for initial test as it is relative small and the results obtained from this mesh still makes sense.
> 
> The 2D meshes are just for testing purpose as I want to reproduce the partition problem on a cluster using PETSc3.11.3 and Intel2019. Unfortunately, I didn't find problem using this example. 
> 
> The code has no problem in using different PETSc versions (PETSc V3.4 to V3.11)
> 
> OK, it is the same code. I thought I saw something about your code changing.
> 
> Just to be clear, v3.11 never gives you good partitions. It is not just a problem on this Intel cluster.
> 
> The machine, compiler and MPI version should not matter.
>  
> and MPI distribution (MPICH, OpenMPI, IntelMPI), except for one simulation case (the mesh I attached) on a cluster with PETSc3.11.3 and Intel2019u4 due to the very different partition compared to PETSc3.9.3. Yet the simulation results are the same except for the efficiency problem because the strange partition results into much more communication (ghost nodes).
> 
> I am still trying different compiler and mpi with PETSc3.11.3 on that cluster to trace the problem. Will get back to you guys when there is update.
> 
> 
> This is very strange. You might want to use 'git bisect'. You set a good and a bad SHA1 (we can give you this for 3.9 and 3.11 and the exact commands). The git will go to a version in the middle. You then reconfigure, remake, rebuild your code, run your test. Git will ask you, as I recall, if the version is good or bad. Once you get this workflow going it is not too bad, depending on how hard this loop is of course.
>  
> Thanks,
> 
> danyang
> 



More information about the petsc-users mailing list