[petsc-dev] Parallel Gauss - Seidel

Sun Jun 10 18:02:12 CDT 2012

On Jun 10, 2012, at 5:54 PM, Mark F. Adams wrote:

> Barry,
> 
> Your algorithm is the classic CS algorithm.  My algorithms exploits the (low dimensional) geometric nature of our graphs.  Assuming you have a decent partitioning, as Jed says, you have, for instance, "interior" vertices that have no external dependence so coloring them is silly.

    Sure absolutely in the MPI world. If you are going for vectorization it may not be silly.

>  The algorithm enforces only the (external) data dependencies that are required for correctness, which for normal PDE problems results in a far shorter dependency depth than generic coloring and leaves you some freedom to reorder for performance.
> 
> From a theoretically point of view there is actually a coloring in the algorithm but it is not obvious.  For instance, if there is only one vertex per process then it basically reverts to a coloring algorithm (although it is still asynchronous) where the processor IDs (in the way that I implement it for simplicity) are the "random" numbers that one can use for an MIS or coloring. 

   I think it would be useful to understand the algorithm exactly in terms of "block colorings?", "dependency graphs", ..... etc?

    Barry

> 
> Mark
> 
> On Jun 10, 2012, at 1:36 PM, Jed Brown wrote:
> 
>> On Sun, Jun 10, 2012 at 12:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>   For clarification it sounds like you color your matrix (with independent sets) and then march through the colors, each color gets smoothed (in parallel since they are independent) then on to the next color?
>> 
>> Not independent sets, just by data dependency. For example, everyone relaxes a block along the right boundary (implying an ordering that visits those blocks in rank order, except that those strips never touch each other so they can be done in parallel), then send the updated state to the proc on the right. While waiting for the message, he might smooth another set that does not have a data dependency. When a process receives the first message (from the proc to the left), it can relax that part. There is a picture in Mark's paper that should make the algorithm clear. This approach doesn't suffer from the atrocious memory access pattern of classical coloring; it's also asynchronous and completes with fewer messages in total.
>