[petsc-dev] Scaling test with ex13 (snes)

Sat Oct 3 11:15:11 CDT 2020

   Mark,

    There is a MATPARTITIONINGHIERARCH (man page) that Fande provided that helped scaling up problems he was working on significantly.

   Barry


> On Oct 3, 2020, at 10:04 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Sat, Oct 3, 2020 at 10:51 AM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
> 
> 
> 
> Secondly, I'd like to add a multilevel "simple" partitioning in DMPlex to optimize communication. I am thinking that I can create a mesh with 'nnodes' cells and distribute that to 'nnodes*procs_node' processes with a "spread" distribution. (the default seems to be "compact"). Then refine that enough to get 'procs_node' more cells and the use a simple partitioner again to put one cell on each process, in such a way that the locality is preserved (not sure how that would work). Then refine from there on each proc for a scaling study.
> 
> 
> Mark
> 
> for multilevel partitioning, you need custom code, since what kills performances with one-to-all patterns in DMPlex is the actual communication of the mesh data.
> However, you can always generate a mesh to have one cell per process, and then refine from there.
> 
> I have coded a multilevel partitioner that works quite well for general meshes, we have it in a private repo with Lisandro. From my experience, the benefits of using the multilevel scheme start from 4K processes on. If you plan very large runs (say > 32K cores) then you definitely want a multistage scheme.
> 
> We never contributed the code since it requires some boilerplate code to run through the stages of the partitioning and move the data.
> If you are using hexas, you can always define your own "shell" partitioner producing box decompositions.
> 
> I could integrate it if you want to stop maintaining it there :) It sounds really useful.
> 
>   Thanks,
> 
>      Matt
>  
> Another option is to generate the meshes upfront in sequential, and then use the parallel HDF5 reader that Vaclav and Matt put together.
>  
> The point here is to get communication patterns that look like an (idealized) well partition application. (I suppose I could take an array of factors, the product of which is the number of processors, and generalize this in a loop for any number of memory levels, or make an oct-tree).
> 
> Any thoughts?
> Thanks,
> Mark
> 
> 
> 
> 
> -- 
> Stefano
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20201003/d31f5b69/attachment.html>