[petsc-dev] Scaling test with ex13 (snes)

Mark Adams mfadams at lbl.gov
Sat Oct 3 10:00:57 CDT 2020


On Sat, Oct 3, 2020 at 10:51 AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

>
>
>
>> Secondly, I'd like to add a multilevel "simple" partitioning in DMPlex to
>> optimize communication. I am thinking that I can create a mesh with
>> 'nnodes' cells and distribute that to 'nnodes*procs_node' processes with a
>> "spread" distribution. (the default seems to be "compact"). Then refine
>> that enough to get 'procs_node' more cells and the use a simple partitioner
>> again to put one cell on each process, in such a way that the locality is
>> preserved (not sure how that would work). Then refine from there on each
>> proc for a scaling study.
>>
>>
> Mark
>
> for multilevel partitioning, you need custom code, since what kills
> performances with one-to-all patterns in DMPlex is the actual communication
> of the mesh data.
> However, you can always generate a mesh to have one cell per process, and
> then refine from there.
>

yes, that is what I do now.


>
> I have coded a multilevel partitioner that works quite well for
> general meshes, we have it in a private repo with Lisandro. From my
> experience, the benefits of using the multilevel scheme start from 4K
> processes on. If you plan very large runs (say > 32K cores) then you
> definitely want a multistage scheme.
>
>
That was my thinking. I am doing scaling studies and I want a
speed-of-light data.


> We never contributed the code since it requires some boilerplate code to
> run through the stages of the partitioning and move the data.
> If you are using hexas, you can always define your own "shell" partitioner
> producing box decompositions.
>

That might work.

Thanks,


>
> Another option is to generate the meshes upfront in sequential, and then
> use the parallel HDF5 reader that Vaclav and Matt put together.
>
>
>> The point here is to get communication patterns that look like an
>> (idealized) well partition application. (I suppose I could take an array of
>> factors, the product of which is the number of processors, and generalize
>> this in a loop for any number of memory levels, or make an oct-tree).
>>
>> Any thoughts?
>> Thanks,
>> Mark
>>
>>
>>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20201003/d74b063f/attachment.html>


More information about the petsc-dev mailing list