[petsc-dev] Generality of VecScatter

Sat Nov 26 10:46:18 CST 2011

On Sat, Nov 26, 2011 at 10:35, Mark F. Adams <mark.adams at columbia.edu>wrote:

> Thats what I'm asking.  I assumed that there is some buffering going on
> here but that is a later question.
>

I assumed that you would either (a) go in discrete rounds where every
process does as much as it can or (b) do some predetermined amount of work,
initiate the update, do another work unit, complete the last update and
initiate a new one, do another work unit, etc.

> Remember that my first primitive was "broadcast", which is essentially
> global-to-local. The map is as a local-to-global map and the "get" is
> initiated at the local space.
>
>
> I was confused, and perhaps still am, the broadcast is the one sided pull
> of ghost data to the local space.  So a broadcast is passive for the
> broadcaster whereas it implied an active process to me.  The reduce is a
> push of ghost data with the same (only) map to the global vertex.
>
> The way I'm writing the algorithm there is no need for a reduce, I think,
> but this is a detail, this algorithm broadcasts the selected vertex status
> (and deleted which I forgot in my psuedo-code) and then it infers the local
> deleted w/o having to have it sent (reduced).
>

Sure, either way works. If the data volume for updates was big enough to be
a performance limitation (instead of the bottleneck being latency), then we
could convert to a message-based model initiated by the owner. It's easy to
extract this when we want it, but it's simpler to skip it.

>
> So I'm thinking a good way to implement this MIS with your model is after
> each vertex loop, you do what you call a "broadcast" of all new status.
>
> Now I want to see exactly how that is implemented, here is a guess:
>

As part of the distributed graph, you have (owner rank, index) for every
ghosted point. As local setup, the communication API makes a local (where
in my array the ghost points reside) and remote (where in the owner's array
the points reside) MPI_Datatype. Store these in an array of length
num_neighbor_ranks.

>
> MPI_Win_fence
> loop over all of your boundary vertices that are not "done" yet, and do an
> MPI_Get (I assume that these requests are buffered in someway)
>

Not over boundary vertices, it loops over neighbor each neighbor rank and
calls one MPI_Get() to fetch the status on all vertices owned by that rank
that are ghosted on this rank.

> MPI_Win_fence.
>

The communication need only be finished once you need the result, so you
can do more work before calling the second MPI_Win_fence().
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111126/f8f449fe/attachment.html>