[petsc-dev] sor smoothers

Tue Aug 13 13:17:29 CDT 2013

> 
>> 
>> MatMult                2 1.0 1.1801e-02 1.0 1.16e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 10  0  0  0   0 10  0  0  0   981
>> MatSOR                 3 1.0 4.6818e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 16  0  0  0   0 16  0  0  0   380
>> 
>> Thus we see that we save all of the MatMult time which is 2 units of the 5 units needed with SOR in terms of flops computed so 40% of the work but only 20% of the time.
>> 
>> On the post-smooth of the multigrid there is a nonzero initial guess eisenstat does 
>> 
>>  if (nonzero) {
>>    ierr = VecCopy(x,eis->b[pc->presolvedone-1]);CHKERRQ(ierr);
>>    ierr = MatSOR(eis->A,eis->b[pc->presolvedone-1],eis->omega,SOR_APPLY_UPPER,0.0,1,1,x);CHKERRQ(ierr);
>> 
>> so an extra .5 work unit 
>> 
>> while Chebychev does the matrix vector product to get the initial residual so 
>> 
>> Eisenstat is 3 units + .5 unit + 1 unit = 4.5 units
>> SOR           5  units               + 1 unit = 6 units 
>> 
>> so for combined pre and post smooth Eisenstat/SOR = 7.5/11 work units
> 
> I think that is right, and indeed, that looks like enough benefit to
> justify converting the matrix format.

Just to be clear.  The current eisenstat code (MatSOR) uses a standard AIJ matrix (obviously) but applies SOR with the U or L terms and so has some logic to skip stuff (e.g., skip L+D when processing U).  If we have native U and L matrices then we should be able to recover most of the ~2x performance penalty that Barry is showing.

If I'm on the right page then we would probably want this new matrix to have a MatMult that applies U & D & L in one shot.  (It might be good to fold this together if performance is limited by cache misses on the source vector.)