[petsc-dev] Generality of VecScatter

Thu Nov 24 12:40:02 CST 2011

On Nov 24, 2011, at 11:24 AM, Jed Brown wrote:

> On Thu, Nov 24, 2011 at 09:34, Mark F. Adams <mark.adams at columbia.edu> wrote:
>> The primitives I can readily provide are
>> 
>> broadcast from Y to X
>> reduce values on X into Y (min, max, sum, replace, etc)
>> 
>> gather values on X into arrays for each point in Y (in some ordering, rank ordering if you like)
> 
> this is awful.  you would at least need to have a way of getting the processors that it came from, the system would have to have this info, right?
> 
> You don't need it for the implementation. If the user wants it, they can get it trivially (gather the ranks over the same communication graph). Sometimes it doesn't matter, and these primitives won't store that by default. Not storing it also makes graph updates lighter weight because you only need to update on one side.
>  
> 
> also, i think you are getting to general here, i'm not sure why anyone would want to do this but i'm sure someone would and they can just write directly to to MPI-3.
> 
> My point is to have a more user-friendly way to offer this information. The point broadcast ("global to local") and reduce ("local to global with operation") are all you need for many operations.
> 
> My intent is to find an abstraction that is more general than VecScatter, higher level than MPI, and that can be used for all non-synchronizing ("neighbor collective") communication patterns in a library like PETSc.

OK, that makes sense at least abstractly.  I'm skeptical that you are going to be able to do much with all this one sided stuff; it seems to be that you would always have the two sided data at some point and then you might setup one sided maps for repeated operations like for a MatVec that are efficient.  But this is kind of trivial.

I would suggest writing a non-trivail algorithm with your model.  I have two ideas:

1) when a coarse AMG (use aggregation MG to me concrete) grid is constructed and you have a map (bipartite graph) with one edge for each fine grid point to a coarse grid point (not necessarily on your processor).  I have too few vertices per processor so I want to reduce the number of processors used in the coarse grid.  I can assume that my fine grid was well partitioned, and to keep the process local assume the fine grid was partitioned with a space filling curve, or something, such that when I aggregate processors (eg, (0-3), (4-7) ... if I reduced the number of active processor by 4) I still have decent data locality.  Eventually I would like a local algorithm to improve load balance at this point, but that is more than you might want to start with.

2) Any AMR algorithm that you like.  Again, you might want to punt on load balancing.

Mark

> Having pointwise gather/scatter allows operations involving all copies of a point (instead of only those operations expressed as a reduction with a binary operation). I think you usually don't need it, but we can provide it cheaply and without extra complexity using this abstraction.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111124/189927bb/attachment.html>