[petsc-dev] Generality of VecScatter

Thu Nov 24 13:12:34 CST 2011

On Thu, Nov 24, 2011 at 12:40, Mark F. Adams <mark.adams at columbia.edu>wrote:

> OK, that makes sense at least abstractly.  I'm skeptical that you are
> going to be able to do much with all this one sided stuff; it seems to be
> that you would always have the two sided data at some point and then you
> might setup one sided maps for repeated operations like for a MatVec that
> are efficient.  But this is kind of trivial.
>

Yeah, that's easy. The update for MatMult() just involves each process
calling one MPI_Get() for each remote process that it needs data from. What
to get is encoded in an MPI_Datatype so there is no user-visible packing on
either side. These calls are all non-blocking and you finish the update
with MPI_Win_fence().

You can test performance of this approach using -vecscatter_window
(although it's slightly different because it does user-visible packing).

Of course we can easily build a two-sided representation, but it's not
necessary. In the one-sided model above, the owner of a global point does
not know how many local parts accessed it (and nowhere inside the MPI
implementation or network stack is there storage proportional to the number
of accessors, so there isn't a scalability problem with globally coupled
dofs).

>
> I would suggest writing a non-trivail algorithm with your model.  I have
> two ideas:
>
> 1) when a coarse AMG (use aggregation MG to me concrete) grid is
> constructed and you have a map (bipartite graph) with one edge for each
> fine grid point to a coarse grid point (not necessarily on your processor).
>  I have too few vertices per processor so I want to reduce the number of
> processors used in the coarse grid.
>

Do you assume the availability of a partitioner at this stage, or you just
want to do something simple like rank-order aggregation? In my model, the
owner of a point would somehow (e.g. by calling a partitioner) determine
the new owner of that point. This information is communicated to the
ghosters of that point (again without knowing how many there are), any
other node/connectivity data is communicated from old owner to new owner
(this is a 1-1 relation, thus trivial with these primitives), and finally
the ghosters address the new location when they need updates.

>  I can assume that my fine grid was well partitioned, and to keep the
> process local assume the fine grid was partitioned with a space filling
> curve, or something, such that when I aggregate processors (eg, (0-3),
> (4-7) ... if I reduced the number of active processor by 4) I still have
> decent data locality.  Eventually I would like a local algorithm to improve
> load balance at this point, but that is more than you might want to start
> with.
>

If you have a local graph partitioner that can tell you what to move for
better load balance, then I think I can move it efficiently and without
redundant storage or specification.

>
> 2) Any AMR algorithm that you like.  Again, you might want to punt on load
> balancing.
>

I think the primitives are fine for this. There is an algorithmic question
of how to modify boundary zones with minimal communication, but if you
decide how much you want to communicate when (e.g. on the chalk board), I
will figure out how to implement it.

I think redistribution for load balancing is represented pretty elegantly,
but it would be good to work out the details.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111124/229d8ae9/attachment.html>