[petsc-dev] Generality of VecScatter

Sat Nov 26 14:25:31 CST 2011

On Nov 26, 2011, at 11:46 AM, Jed Brown wrote:

> On Sat, Nov 26, 2011 at 10:35, Mark F. Adams <mark.adams at columbia.edu> wrote:
> Thats what I'm asking.  I assumed that there is some buffering going on here but that is a later question.
> 
> I assumed that you would either (a) go in discrete rounds where every process does as much as it can or (b) do some predetermined amount of work, initiate the update, do another work unit, complete the last update and initiate a new one, do another work unit, etc. 
>> Remember that my first primitive was "broadcast", which is essentially global-to-local. The map is as a local-to-global map and the "get" is initiated at the local space.
> 
> 
> I was confused, and perhaps still am, the broadcast is the one sided pull of ghost data to the local space.  So a broadcast is passive for the broadcaster whereas it implied an active process to me.  The reduce is a push of ghost data with the same (only) map to the global vertex.
> 
> The way I'm writing the algorithm there is no need for a reduce, I think, but this is a detail, this algorithm broadcasts the selected vertex status (and deleted which I forgot in my psuedo-code) and then it infers the local deleted w/o having to have it sent (reduced).
> 
> Sure, either way works. If the data volume for updates was big enough to be a performance limitation (instead of the bottleneck being latency), then we could convert to a message-based model initiated by the owner. It's easy to extract this when we want it, but it's simpler to skip it.
>  
> 
> So I'm thinking a good way to implement this MIS with your model is after each vertex loop, you do what you call a "broadcast" of all new status.
> 
> Now I want to see exactly how that is implemented, here is a guess:
> 
> As part of the distributed graph, you have (owner rank, index) for every ghosted point. As local setup, the communication API makes a local (where in my array the ghost points reside) and remote (where in the owner's array the points reside) MPI_Datatype. Store these in an array of length num_neighbor_ranks.
>  
> 
> MPI_Win_fence
> loop over all of your boundary vertices that are not "done" yet, and do an MPI_Get (I assume that these requests are buffered in someway) 
> 
> Not over boundary vertices, it loops over neighbor each neighbor rank and calls one MPI_Get() to fetch the status on all vertices owned by that rank that are ghosted on this rank.

So each MPI_Get initiates a message and you pack up a message with an array of remote pointers or something?

It sounds like you are saying that you have an auxiliary arrays of remote global indices for each processor that you communicate with and your broadcast code looks like:

for all 'proc' that I talk to
 i = 0
 for all 'v' on proc list
  if v.state == not-done
    data[i++] = [ &v.state, v.id, STATE ]  //  vague code here...
  endif
 endfor
 MPI_Get(proc,i,data)
endfor
MPI_Win_fence // now ready for next vertex loop 

Mark

>  
> MPI_Win_fence.
> 
> The communication need only be finished once you need the result, so you can do more work before calling the second MPI_Win_fence().

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111126/8068938a/attachment.html>