[petsc-dev] many subdomains per process

Jed Brown jed at 59A2.org
Sat Feb 6 16:21:16 CST 2010

Sometimes I like to "simulate" having a big machine in an interactive
environment, mostly to investigate algorithmic scalability for high
process counts.  You can oversubscribe a small machine up to a point,
but kernels don't work so well when they have thousands of processes
trying to do memory-bound operations.

So I'll take, e.g. an 8-core box with 8 GiB of memory, and do things
like the following

  mpiexec -n 8 ./ex48 -M 4 -P 3 -thi_nlevels 5 -thi_hom z -thi_L 5e3 -ksp_monitor -ksp_converged_reason -snes_monitor -log_summary -mg_levels_pc_type asm -mg_levels_pc_asm_blocks 8192 -thi_mat_type baij
  Level 0 domain size (m)    5e+03 x    5e+03 x    1e+03, num elements   4x  4x  3 (      48), size (m) 1250 x 1250 x 500
  Level 1 domain size (m)    5e+03 x    5e+03 x    1e+03, num elements   8x  8x  5 (     320), size (m) 625 x 625 x 250
  Level 2 domain size (m)    5e+03 x    5e+03 x    1e+03, num elements  16x 16x  9 (    2304), size (m) 312.5 x 312.5 x 125
  Level 3 domain size (m)    5e+03 x    5e+03 x    1e+03, num elements  32x 32x 17 (   17408), size (m) 156.25 x 156.25 x 62.5
  Level 4 domain size (m)    5e+03 x    5e+03 x    1e+03, num elements  64x 64x 33 (  135168), size (m) 78.125 x 78.125 x 31.25

These are absurdly small subdomains, with only 33 dofs per subdomain on
the fine level, and 0.078 dofs per subdomain on level 1, but I'm still
unhappy to see thousands of lines of this crap coming from Parmetis:

        ***Cannot bisect a graph with 0 vertices!
        ***You are trying to partition a graph into too many parts!

I even get some of this with the slightly less contrived

  mpiexec -n 8 ./ex48 -M 16 -P 9 -thi_nlevels 3 -thi_hom z -thi_L 5e3 -ksp_monitor -ksp_converged_reason -snes_monitor -log_summary -mg_levels_pc_type asm -mg_levels_pc_asm_blocks 8192 -thi_mat_type baij -dmmg_view -snes_max_it 3 -thi_verbose -mg_coarse_pc_type hypre

Here, level 1 has 4.25 dofs (2.25 nodes) in each subdomain, so I
wouldn't expect empty subdomains.  Apparently Parmetis is still
producing a usable partition because the solver works, but I'm curious
how to prevent this outburst.  It's apprently not as simple as just
checking that there are as many nodes as requested subdomains.  Is this
something worth working around on the PETSc side, or should I ask

Finally, I have a concern over memory usage.  When I run with these huge
subdomain counts, I see huge memory spike at setup.  This is independent
of Parmetis:

  mpiexec -n 8 ./ex48 -M 24 -P 7 -thi_nlevels 3 -thi_hom z -thi_L 10e3 -ksp_monitor -ksp_converged_reason -snes_monitor -log_summary -mg_levels_pc_type asm -mg_levels_pc_asm_blocks 8192 -thi_mat_type baij -dmmg_view -thi_verbose -mg_coarse_pc_type hypre -thi_mat_partitioning_type square -mat_partitioning_type square

Each process briefly goes up to about 1000 MB resident, then drops to
about 100 MB, and finally climbs slowly to stabilize at 480 MB (once
matrices are assembled and factored).  I haven't tracked down the
source, but there is clearly a huge allocation, all the pages are
faulted, and then released very soon afterwards.  That memory doesn't
seem to be attributable to any objects because the usage below only adds
up to approximately the stable resident size (nowhere near the huge

  Memory usage is given in bytes:

  Object Type          Creations   Destructions   Memory  Descendants' Mem.
  Reports information only for process 0.

  --- Event Stage 0: Main Stage

   Toy Hydrostatic Ice     1              1  588.000000     0
     Distributed array     6              6  352752.000000     0
                   Vec  6250           6250  41393472.000000     0
           Vec Scatter  6163           6163  3645548.000000     0
             Index Set 24614          24614  21788136.000000     0
     IS L to G Mapping  2060           2060  143594928.000000     0
                Matrix  6165           6165  360639852.000000     0
   Matrix Partitioning     2              2  896.000000     0
                  SNES     3              3  3096.000000     0
         Krylov Solver  2057           2057  1781272.000000     0
        Preconditioner  2057           2057  1448592.000000     0
                Viewer     3              3  1632.000000     0


