[petsc-dev] sor smoothers

Sat Aug 17 15:16:36 CDT 2013

I would like to get an idea of how much benefit there would be with using a special matrix type for SOR.  Here is an experiment on 128 cores of Hopper (Cray XE6), 7 point stencil, with some embedded BCs that look like higher order stencils at BCs.  32^3 subdomian on each core:

cheb/jacobi(2)
KSPSolve              15 1.0 5.9800e+00 1.0 1.13e+09 3.2 6.2e+06 1.1e+03 2.8e+02  7 29 67 46  7  26100100100 76 18765

rich/eisenstat(2)
KSPSolve              15 1.0 1.1563e+01 1.0 1.37e+09 3.4 5.4e+06 1.1e+03 2.8e+02 12 32 66 44  7  38100100100 76 11659

rich/sor
KSPSolve              15 1.0 4.6355e+00 1.0 7.63e+08 4.5 3.2e+06 1.0e+03 3.1e+02 10 21 57 31  8  33100100100 77 15708

Complete log files attached.  The "projection" solve is the solver of interest.

I have 2 Jacobi so that it has about the same amount of work a one (s)sor.  There are voids in the domain which I believe accounts for the large differences in the number of flops per core.  These were run with the same processor group (i.e., all runs done in the same qsub script)

This shows only about 15% potential gain.  Should we conclude that there is not much to gain from an optimized data structure?

Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_eis
Type: application/octet-stream
Size: 98384 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130817/3dd016b9/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_jac
Type: application/octet-stream
Size: 98137 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130817/3dd016b9/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_sor
Type: application/octet-stream
Size: 99060 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130817/3dd016b9/attachment-0002.obj>
-------------- next part --------------

On Aug 16, 2013, at 7:53 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> "Mark F. Adams" <mfadams at lbl.gov> writes:
>> Some hypre papers have shown that cheb/jacobi is faster for some
>> problems but for me robustness trumps this for default solver
>> parameters in PETSc.
> 
> Richardson/SOR is about the best thing out there in terms of reasonable
> local work, low/no setup cost, and reliable convergence.  Cheby/Jacobi
> just has trivial fine-grained parallelism, but it's not clear that buys
> anything on a CPU.
> 
>> Jed's analysis suggests that Eisenstat's method saves almost 50% work
>> but needs a specialized matrix to get good flop rates.  Something to
>> think about doing ? 
> 
> Mine was too sloppy, Barry got it right.  Eisenstat is for Cheby/SOR,
> however, and doesn't do anything for Richardson.
> 
> To speed Richardson/SSOR up with a new matrix format, I think we have to
> cache the action of the lower triangular part in the forward sweep so
> that the back-sweep can use it, and vice-versa.  With full caching and
> triangular residual optimization, I think this brings 2 SSOR iterations
> of the down-smoother plus a residual to 2.5 work units in the
> down-smooth (zero initial guess) and 3 work units in the up-smooth
> (nonzero initial guess).  (This is a strong smoother and frequently, one
> SSOR would be enough.)