[petsc-dev] Generality of VecScatter

Fri Nov 25 20:06:40 CST 2011

On Nov 25, 2011, at 4:22 PM, Jed Brown wrote:

> On Fri, Nov 25, 2011 at 13:07, Mark F. Adams <mark.adams at columbia.edu> wrote:
> OK, thats a good start.  You do need two communication steps, 1) get offsets, 2) get data (right?).
> 
> Yes, to send variable-sized data, you need two steps. For constant-sized data, it is just one.
>  
>  You avoid packing the send data but you need to unpack the receive data, converting the raw graph data into the reduced ghost data.  Not simple but writing from one side only and with only one-to-one communication is attractive.
> 
> That depends a bit on what representation you want for the ghost data. You never "unpack" in a by-rank way since it gets put into the correct part of the domain according to where an MPI_Datatype says to put it. (This was part of the local setup when defining the MPI_Get() calls, but those are internal to the library.)
> 
> If you are increasing the level of ghosting, you probably want to number the new ghost nodes that you acquired and make the local data structure point directly at them instead of at the (rank, offset) at which they are owned. This is a global-to-local translation of the connectivity, so you can do it with a hash table or sorting. I think you can also skip sorting by pushing the new indices back through the global space.
> 
> That is, I effectively have two local-to-global maps, both pointing at the same global space: L1toG and L2toG. I want to translate some local information defined on L1 into L2. I can do this by fetching global indices and inverting the local part of the map, or (with the assistance of some memory provided by the global owner) I can do a gather L2 -> G* (G* has a slot for each process sharing via this map) and then get those indices back in L1 using scatter G* -> L1. I think the hash is usually better, but with a fast network and/or highly vectorized hardware, there might be an advantage to pushing the indices L2 -> G* -> L1 without the need for hashing or sorting.
> 
> 
> So keep going.  AMR is good but I'm not sure I'm up to specifying a complete unstructured AMR algorithm at the moment.  My other example of AMG coarse grid process aggregation is another....
> 
> Or perhaps a parallel maximal independent set algorithm, preferably my algorithm which is implemented in GAMG and documented in an old paper.
> 
> MIS is straightforward. To define the parallel graph, let's say that ghosting processes have (owner rank, index) of neighbors. An owned point is only a candidate for selection if all neighbors have the same rank or smaller (according to Rule 2 of your paper).
> 
> while (candidates remain):
>     forall v in owned_candidates with neighbor ranks smaller than p:

This is not quite right, and I want to see how complex or cumbersome this one sided model will get when you get all the details in.

You could say loop:

for all owned_candidates v
 for all neighbors q of v 
   if q is selected
      delete v
      skip v
   else if q.proc > myproc and q is not deleted
      skip v
   endif
 end for
 if not skip v then 
   select v
   delete all q
   broadcast v status ????
 endif
end for
reduce all status to owners ???

I thought in your one-sided model you could not broadcast, or push data.  You can only pull data and only want to pull data.  But you have a broadcast???

Mark

>         mark v as selected, mark neighbors(v) as deleted (in local space)
>     reduce status to owner
>     broadcast status to ghosters
> 
> In any round, multiple procs might have marked the vertex for deletion, but only if the vertex has a neighbor with larger owner rank, in which case it could not have been selected.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111125/79736b14/attachment.html>