[petsc-dev] many subdomains per process
Jed Brown
jed at 59A2.org
Sat Feb 6 16:21:16 CST 2010
Sometimes I like to "simulate" having a big machine in an interactive
environment, mostly to investigate algorithmic scalability for high
process counts. You can oversubscribe a small machine up to a point,
but kernels don't work so well when they have thousands of processes
trying to do memory-bound operations.
So I'll take, e.g. an 8-core box with 8 GiB of memory, and do things
like the following
mpiexec -n 8 ./ex48 -M 4 -P 3 -thi_nlevels 5 -thi_hom z -thi_L 5e3 -ksp_monitor -ksp_converged_reason -snes_monitor -log_summary -mg_levels_pc_type asm -mg_levels_pc_asm_blocks 8192 -thi_mat_type baij
Level 0 domain size (m) 5e+03 x 5e+03 x 1e+03, num elements 4x 4x 3 ( 48), size (m) 1250 x 1250 x 500
Level 1 domain size (m) 5e+03 x 5e+03 x 1e+03, num elements 8x 8x 5 ( 320), size (m) 625 x 625 x 250
Level 2 domain size (m) 5e+03 x 5e+03 x 1e+03, num elements 16x 16x 9 ( 2304), size (m) 312.5 x 312.5 x 125
Level 3 domain size (m) 5e+03 x 5e+03 x 1e+03, num elements 32x 32x 17 ( 17408), size (m) 156.25 x 156.25 x 62.5
Level 4 domain size (m) 5e+03 x 5e+03 x 1e+03, num elements 64x 64x 33 ( 135168), size (m) 78.125 x 78.125 x 31.25
These are absurdly small subdomains, with only 33 dofs per subdomain on
the fine level, and 0.078 dofs per subdomain on level 1, but I'm still
unhappy to see thousands of lines of this crap coming from Parmetis:
***Cannot bisect a graph with 0 vertices!
***You are trying to partition a graph into too many parts!
I even get some of this with the slightly less contrived
mpiexec -n 8 ./ex48 -M 16 -P 9 -thi_nlevels 3 -thi_hom z -thi_L 5e3 -ksp_monitor -ksp_converged_reason -snes_monitor -log_summary -mg_levels_pc_type asm -mg_levels_pc_asm_blocks 8192 -thi_mat_type baij -dmmg_view -snes_max_it 3 -thi_verbose -mg_coarse_pc_type hypre
Here, level 1 has 4.25 dofs (2.25 nodes) in each subdomain, so I
wouldn't expect empty subdomains. Apparently Parmetis is still
producing a usable partition because the solver works, but I'm curious
how to prevent this outburst. It's apprently not as simple as just
checking that there are as many nodes as requested subdomains. Is this
something worth working around on the PETSc side, or should I ask
upstream?
Finally, I have a concern over memory usage. When I run with these huge
subdomain counts, I see huge memory spike at setup. This is independent
of Parmetis:
mpiexec -n 8 ./ex48 -M 24 -P 7 -thi_nlevels 3 -thi_hom z -thi_L 10e3 -ksp_monitor -ksp_converged_reason -snes_monitor -log_summary -mg_levels_pc_type asm -mg_levels_pc_asm_blocks 8192 -thi_mat_type baij -dmmg_view -thi_verbose -mg_coarse_pc_type hypre -thi_mat_partitioning_type square -mat_partitioning_type square
Each process briefly goes up to about 1000 MB resident, then drops to
about 100 MB, and finally climbs slowly to stabilize at 480 MB (once
matrices are assembled and factored). I haven't tracked down the
source, but there is clearly a huge allocation, all the pages are
faulted, and then released very soon afterwards. That memory doesn't
seem to be attributable to any objects because the usage below only adds
up to approximately the stable resident size (nowhere near the huge
spike).
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Toy Hydrostatic Ice 1 1 588.000000 0
Distributed array 6 6 352752.000000 0
Vec 6250 6250 41393472.000000 0
Vec Scatter 6163 6163 3645548.000000 0
Index Set 24614 24614 21788136.000000 0
IS L to G Mapping 2060 2060 143594928.000000 0
Matrix 6165 6165 360639852.000000 0
Matrix Partitioning 2 2 896.000000 0
SNES 3 3 3096.000000 0
Krylov Solver 2057 2057 1781272.000000 0
Preconditioner 2057 2057 1448592.000000 0
Viewer 3 3 1632.000000 0
Jed
More information about the petsc-dev
mailing list