<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Dec 23, 2011, at 11:54 AM, Matthew Knepley wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">On Fri, Dec 23, 2011 at 10:48 AM, Mark F. Adams <span dir="ltr"><<a href="mailto:mark.adams@columbia.edu">mark.adams@columbia.edu</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word"><div><div class="h5"><br><div><div>On Dec 23, 2011, at 10:53 AM, Jed Brown wrote:</div><br><blockquote type="cite"><div class="gmail_quote">On Fri, Dec 23, 2011 at 09:50, Mark F. Adams <span dir="ltr"><<a href="mailto:mark.adams@columbia.edu" target="_blank">mark.adams@columbia.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Humm, my G-S in not in PETSc and it is perfectly scalable.  It does have more complex communication patterns but they are O(1) in latency and bandwidth.  I'm not sure I understand your description above.</blockquote>

</div>

<br><div>It was more like, here's something that perhaps we want to put in PETSc, what rich communication pattern does it use, such that, if provided, the implementation would be simple?</div>

</blockquote></div><br></div></div><div>There is the implementation in Prometheus that uses my C++ linked lists and hash tables.  I would like to implement this with STLs.  I also hack into MPIAIJ matrices to provide a primitive of applying G-S on an index set of local vertices, required for the algorithm.  This should be rethought.  I would guess that it would take about a week or two to move this into PETSc.</div>

<div><br></div><div>The complex communication required make this code work much better with large subdomains, so it is getting less attractive in a flat MPI mode, as it is currently written.  If I do this I would like to think about doing it in the next programming model of PETSc (pthreads?).  Anyway, this would take enough work that I'd like to think a bit about its design and even the algorithm in a non flat MPI model.</div>

</div></blockquote><div><br></div><div>I think we should give at least some thought to how this would look in Thrust/OpenCL.</div><div><br></div></div></blockquote><div><br></div><div>A simple(er) thing to do is do whatever you want in this new (hack) kernel that I mentioned.  This just applies G-S on an (MPI)AIJ matrix (or whatever you want to code up), however you want to do this.  This kernel just needs apply G-S to a subset of the local equations.</div><div><br></div><div>A more interesting thing is partition down to the thread level and keep about 100 vertices per thread (this might be to big for a GPU...) and then use locks of some sort for the shared memory synchronization and the existing MPI code for the distributed memory part.  This would take a fair amount of work but it would be very nice and this type of synchronization that comes up in other algorithms like the fused multigrid that I'm working on now.</div><div><br></div><div>Mark</div><br><blockquote type="cite"><div class="gmail_quote"><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word"><div>Note, I see the win with G-S over Cheby in highly unsymmetric (convection, hyperbolic) problems where Cheby is not very good.</div><span class="HOEnZb"><font color="#888888"><div><br>

</div><div>Mark</div></font></span></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

</blockquote></div><br></body></html>