[petsc-users] DMPlex partition problem

Wed Apr 8 19:46:14 CDT 2020

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 5:32 PM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 8:18 PM Danyang Su <danyang.su at gmail.com> wrote:

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 4:50 PM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 7:47 PM Danyang Su <danyang.su at gmail.com> wrote:

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 4:41 PM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 5:52 PM Danyang Su <danyang.su at gmail.com> wrote:

Hi Matt, 

I am one step closer now. When run the ex1 code with ‘-interpolate’, the partition is good, without it, it’s weird.

 Crap! That did not even occur to me. Yes, the dual graph construction will not work for uninterpolated wedges.

So, do you really need an uninterpolated mesh? If so, I can put it on the buglist.

For Prism mesh, I am afraid so. For 2D triangle mesh and 3D tetra mesh, the partition is pretty good without interpolate. That’s why I didn’t have problem for all my previous simulations using the other cell types.

What I mean is, are you avoiding interpolating the mesh for memory? The amount of memory is usually small compared to

fields on the mesh.

No, not because of memory consumption problem. When the code was first written several years ago, I just put interpolate = false there. Now after setting interpolate = true, I need to update the code in setting cell-node index (array cell). The following code does not work anymore when interpolate = true. There is some code that is not well written and it needs to be improved. 

      !c add local to global cell id mapping

      do ipoint = 0, istart-1

        icell = ipoint + 1

        call DMPlexGetCone(dmda_flow%da,ipoint,cone,ierr)

        CHKERRQ(ierr)

        do ivtex = 1, num_nodes_per_cell

          cell_node_idx(:,ipoint+1) = cone - istart + 1

        end do

        call DMPlexRestoreCone(dmda_flow%da,ipoint,cone,ierr)

        CHKERRQ(ierr)

      end do

My F90 is a bit shaky, but I think you want

     PetscInt, pointer :: nClosure(:)

     do ipoint = 0, istart-1
        icell = ipoint + 1
        call DMPlexGetClosure(dmda_flow%da,ipoint,cone,ierr);CHKERRQ(ierr)
        call DMPlexGetTransitiveClosure(dmda_flow%da,ipoint,PETSC_TRUE,nClosure,ierr);CHKERRQ(ierr)
        ivtex = 0
        do icl = 0,len(nClosure)-1,2
          if ((nClosure(icl) >= vStart) .and. (nClosure(icl) < vEnd)) then
            cell_node_idx(ivtex,ipoint+1) = nClosure(icl) - istart + 1
            ivtex = ivtex + 1
          end if
        end do
        call DMPlexRestoreTransitiveClosure(dmda_flow%da,ipoint,PETSC_TRUE,nClosure,ierr);CHKERRQ(ierr)
     end do

Basically, you use the closure, and filter out everything that is not a vertex.

Thanks, Matt. Would you mind give some tips on the following code as well. The last line of following section (num_cells_loc = num_cells+num_nodes-nleaves-num_nodes_loc) does not work either when interpolate == true.

      !c get local mesh DM

      call DMGetCoordinatesLocal(dmda_flow%da,gc,ierr)

      CHKERRQ(ierr)

      call DMGetCoordinateDM(dmda_flow%da,cda,ierr)

      CHKERRQ(ierr)

      call DMGetSection(cda,cs,ierr)

      CHKERRQ(ierr)

      call PetscSectionGetChart(cs,istart,iend,ierr)

      CHKERRQ(ierr)

      !c Calculate number of nodes/cells with ghost nodes/cells for each processor

      num_nodes = iend-istart

      num_cells = istart

      !c Calculate local number of nodes without ghost nodes

      num_nodes_loc = 0

      do ipoint = istart, iend-1

        call DMPlexGetPointGlobal(cda,ipoint,pstart,pend,ierr)

        CHKERRQ(ierr)

        if (pend >= 0) then

          num_nodes_loc = num_nodes_loc + 1

        end if

      end do

      !c Calculate number of cells without ghost cells for each processor

      call DMGetPointSF(dmda_flow%da,sf,ierr)

      CHKERRQ(ierr)

      call PetscSFGetGraph(sf,nroots,nleaves,gmine,gremote,ierr)

      CHKERRQ(ierr)      

      !!!!!This calculation is correct when interpolate == false!!!!!

      !!!!!    but  incorrect when interpolate == true          !!!!!

      num_cells_loc = num_cells+num_nodes-nleaves-num_nodes_loc

Thanks,

Danyang

  Thanks,

    Matt

Thanks,

Danyang

  Thanks,

    Matt

Thanks,

Danyang

  Thanks,

     Matt 

Thanks,

Danyang

From: Danyang Su <danyang.su at gmail.com>
Date: Wednesday, April 8, 2020 at 2:12 PM
To: Matthew Knepley <knepley at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

Hi Matt,

Attached is another prism mesh using 8 processors. The partition of the lower mesh does not looks good. 

Thanks,

Danyang

From: Danyang Su <danyang.su at gmail.com>
Date: Wednesday, April 8, 2020 at 1:50 PM
To: Matthew Knepley <knepley at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

Hi Matt,

Here is what I get using ex1c with stencil 0. There is no change in the source code. I just compile and run the code in different ways. By using ‘make -f ./gmakefile ….’, it works as expected. However, by using ‘make ex1’ and then run the code using ‘mpiexec -n …’, the partition does not looks good. My code has the same problem as this one if I use prism mesh.

I just wonder what makes this difference, even without overlap.

Thanks,

Danyang

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 1:32 PM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 4:26 PM Danyang Su <danyang.su at gmail.com> wrote:

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 12:50 PM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 3:22 PM Danyang Su <danyang.su at gmail.com> wrote:

Hi Matt,

Here is something pretty interesting. I modified ex1.c file with output of number of nodes and cells (as shown below) .  And I also changed the stencil size to 1.

    /* get coordinates and section */

    ierr = DMGetCoordinatesLocal(*dm,&gc);CHKERRQ(ierr);

    ierr = DMGetCoordinateDM(*dm,&cda);CHKERRQ(ierr);

    ierr = DMGetSection(cda,&cs);CHKERRQ(ierr);

    ierr = PetscSectionGetChart(cs,&istart,&iend);CHKERRQ(ierr);

    num_nodes = iend-istart;

    num_cells = istart;

    /* Output rank and processor information */

printf("rank %d: of nprcs: %d, num_nodes %d, num_cess %d\n", rank, size, num_nodes, num_cells);  

If I compile the code using ‘make ex1’ and then run the test using ‘mpiexec -n 2 ./ex1 -filename basin2layer.exo’, I get the same problem as the modified ex1f90 code I sent.

➜  tests mpiexec -n 2 ./ex1 -filename basin2layer.exo 

rank 1: of nprcs: 2, num_nodes 699, num_cess 824

rank 0: of nprcs: 2, num_nodes 699, num_cess 824

Ah, I was not looking closely. You are asking for a cell overlap of 1 in the partition. That is why these numbers sum to more than

the total in the mesh. Do you want a cell overlap of 1?

Yes, I need cell overlap of 1 in some circumstance. The mesh has two layers of cells with 412  cells per layer and three layers of nodes with 233 nodes per layer.  The number of cells looks good to me. I am confused why the same code generates pretty different partition.  If I set the stencil to 0, I get following results. The first method looks good and the second one is not a good choice, with much more number of ghost nodes.

➜  petsc-3.13.0 make -f ./gmakefile test globsearch="dm_impls_plex_tests-ex1_cylinder" EXTRA_OPTIONS="-filename ./basin2layer.exo -dm_view hdf5:$PWD/mesh.h5 -dm_partition_view" NP=2

#              > rank 1: of nprcs: 2, num_nodes 354, num_cess 392

#              > rank 0: of nprcs: 2, num_nodes 384, num_cess 432

➜  tests mpiexec -n 2 ./ex1 -filename basin2layer.exo

rank 0: of nprcs: 2, num_nodes 466, num_cess 412

rank 1: of nprcs: 2, num_nodes 466, num_cess 412

I think this might just be a confusion over interpretation. Here is how partitioning works:

  1) We partition the mesh cells using ParMetis, Chaco, etc.

  2) We move those cells (and closures) to the correct processes

  3) If you ask for overlap, we mark a layer of adjacent cells on remote processes and move them to each process

The original partitions are the same, Then we add extra cells, and their closures, to each partition. This is what you are asking for.

You would get the same answer with GMsh if it gave you an overlap region.

  Thanks,

    Matt

Thanks,

Danyang

  Thanks,

    Matt

➜  tests mpiexec -n 4 ./ex1 -filename basin2layer.exo

rank 1: of nprcs: 4, num_nodes 432, num_cess 486

rank 0: of nprcs: 4, num_nodes 405, num_cess 448

rank 2: of nprcs: 4, num_nodes 411, num_cess 464

rank 3: of nprcs: 4, num_nodes 420, num_cess 466

However, if I compile and run the code using the script you shared, I get reasonable results.

➜  petsc-3.13.0 make -f ./gmakefile test globsearch="dm_impls_plex_tests-ex1_cylinder" EXTRA_OPTIONS="-filename ./basin2layer.exo -dm_view hdf5:$PWD/mesh.h5 -dm_partition_view" NP=2

#      > rank 0: of nprcs: 2, num_nodes 429, num_cess 484

#      > rank 1: of nprcs: 2, num_nodes 402, num_cess 446

➜  petsc-3.13.0 make -f ./gmakefile test globsearch="dm_impls_plex_tests-ex1_cylinder" EXTRA_OPTIONS="-filename ./basin2layer.exo -dm_view hdf5:$PWD/mesh.h5 -dm_partition_view" NP=4

#      > rank 1: of nprcs: 4, num_nodes 246, num_cess 260

#      > rank 2: of nprcs: 4, num_nodes 264, num_cess 274

#      > rank 3: of nprcs: 4, num_nodes 264, num_cess 280

#      > rank 0: of nprcs: 4, num_nodes 273, num_cess 284

Is there some difference in compiling or runtime options that cause the difference? Would you please check if you can reproduce the same problem using the modified ex1.c?

Thanks,

Danyang

From: Danyang Su <danyang.su at gmail.com>
Date: Wednesday, April 8, 2020 at 9:37 AM
To: Matthew Knepley <knepley at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 9:20 AM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 12:13 PM Danyang Su <danyang.su at gmail.com> wrote:

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, April 8, 2020 at 6:45 AM
To: Danyang Su <danyang.su at gmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DMPlex partition problem

On Wed, Apr 8, 2020 at 7:25 AM Matthew Knepley <knepley at gmail.com> wrote:

On Wed, Apr 8, 2020 at 12:48 AM Danyang Su <danyang.su at gmail.com> wrote:

Dear All,

Hope you are safe and healthy.

I have a question regarding pretty different partition results of prism mesh. The partition in PETSc generates much more ghost nodes/cells than the partition in Gmsh, even though both use metis as partitioner. Attached please find the prism mesh in both vtk and exo format, the test code modified based on ex1f90 example. Similar problem are observed for larger dataset with more layers.

I will figure this out by next week.

I have run your mesh and do not get those weird partitions. I am running in master. What are you using? Also, here is an easy way

to do this using a PETSc test:

cd $PETSC_DIR

make -f ./gmakefile test globsearch="dm_impls_plex_tests-ex1_cylinder" EXTRA_OPTIONS="-filename ${HOME}/Downloads/basin2layer.exo -dm_view hdf5:$PWD/mesh.h5 -dm_partition_view" NP=5 

./lib/petsc/bin/petsc_gen_xdmf.py mesh.h5

and then load mesh.xmf into Paraview. Here is what I see (attached). Is it possible for you to try the master branch?

Hi Matt,

Thanks for your quick response. If I use your script, the partition looks good, as shown in the attached figure. I am working on PETSc 3.13.0 release version on Mac OS. 

Does the above script use code /petsc/src/dm/label/tutorials/ex1c.c?

It uses $PETSC_DIR/src/dm/impls/plex/tests/ex1.c

I looked at your code and cannot see any difference. Also, no changes are in master that are not in 3.13. This is very strange.

I guess we will have to go one step at a time between the example and your code.

I will add mesh output to the ex1f90 example and check if the cell/vertex rank is exactly the same. I wrote the mesh output myself based on the partition but there should be no problem in that part. The number of ghost nodes and cells is pretty easy to check. Not sure if there is any difference between the C code and Fortran code that causes the problem. Anyway, I will keep you updated.

  Thanks,

    Matt

  Thanks,

    Matt

  Thanks,

     Matt

For example, in Gmsh, I get partition results using two processors and four processors as shown below, which are pretty reasonable.

However, in PETSc, the partition looks a bit weird. Looks like it takes layer partition first and then inside layer. If the number of nodes per layer is very large, this kind of partitioning results into much more ghost nodes/cells. 

Anybody know how to improve the partitioning in PETSc? I have tried parmetis and chaco. There is no big difference between them. 

Thanks,

Danyang

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-- 

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200408/5494f667/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 404239 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200408/5494f667/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 281270 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200408/5494f667/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 457837 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200408/5494f667/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 542687 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200408/5494f667/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 349318 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200408/5494f667/attachment-0009.png>